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SYNTHETIC PEPTIDES AND USES THEREFORE 
FIELD OF THE INVENTION 

THIS INVENTION relates generally to agents for modulating immune responses. 
More particularly, the present invention relates to a synthetic polypeptide comprising a 
5 plurality of different segments of a parent polypeptide, wherein the segments are linked to 
each other such that one or more functions of the parent polypeptide are impeded, 
abrogated or otherwise altered and such that the synthetic polypeptide, when introduced 
into a suitable host, can elicit an immune response against the parent polypeptide. The 
invention also relates to synthetic polynucleotides encoding the synthetic polypeptides and 
10 to synthetic constructs comprising these polynucleotides. The invention further relates to 
the use of the polypeptides and polynucleotides of the invention in compositions for 
modulating immune responses. The invention also extends to methods of using such 
compositions for prophylactic and/or therapeutic purposes. 

Bibliographic details of various publications referred to in this specification are 
1 5 collected at the end of the description. 

BACKGROUND OF THE INVENTION 

The modem reductionist approach to vaccine and therapy development has been 
pursued for a number of decades and attempts to focus only on those parts of pathogens or 
of cancer proteins which are relevant to the immune system. To date the performance of 
20 this approach has been relatively poor considering the vigorous research carried out and 
the number of effective vaccines and therapies that it has produced. This approach is still 
being actively pursued, however, despite its poor performance because vaccines developed 
using this approach are often extremely safe and because only by completely 
understanding the immune system can new vaccine strategies be developed. 

25 One area that has benefited greatly from research efforts is knowledge about how 

the adaptive immune system operates and more specifically how T and B cells learn to 
recognise specific parts of pathogens and cancers. T cells are mainly involved in cell- 
mediated immunity wheieas B cells are involved in the generation of antibody-mediated 
immunity. The two most important types of T cells involved in adaptive cellular immunity 
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are ap CD8 + cytotoxic T lymphocytes (CTL) and CD4 + T helper lymphocytes. CTL arc 
important mediators of cellular immunity against many viruses, tumours, some bacteria 
and some parasites because they are able to kill infected cells directly and secrete various 
factors which can have powerful effects on the spread of infectious organisms. CTLs 
5 recognise epitopes derived from foreign intracellular proteins, which are 8-10 amino acids 
long and which are presented by class I major histocompatibility complex (MHQ 
molecules (in humans called human lymphocyte antigens - HLAs) (Jardetzky et aL 9 1991; 
Fremont et al y 1992; Rotzschke et ol. 9 1990). T helper cells enhance and regulate CTL 
responses and are necessary for the establishment of long-lived memory CTL. They also 

10 inhibit infectious organisms by secreting cytokines such as IFN-y. T helper cells recognise 
epitopes derived mostly from extracellular proteins which are 12-25 amino acids long and 
which are presented by class n MHC molecules (Chicz et ai 9 1993; Newcomb et al r 
1993). B cells, or more specifically the antibodies they secrete, are important mediators in 
the control and clearance of mostly extracellular organisms. Antibodies recognise mainly 

15 conformational determinants on the surface of organisms, for example, although 
sometimes they may recognise short linear determinants. 

Despite significant advances towards understanding how T and linear B cell 
epitopes are processed and presented to the immune system, the full potential of epitope- 
based vaccines has not been fully exploited. The main reason for this is the large number 

20 of different T cell epitopes, which have to be included into such vaccines to cover the 
extreme HLA polymorphism in the human population. The human HLA diversity is one of 
the main reasons why whole pathogen vaccines frequently provide better population 
coverage than subunit or peptide-based vaccine strategies. There is a range of epitope- 
based strategies though which have tried to solve this problem, e.g., peptide blends, peptide 

25 conjugates and polyepitope vaccines (ie comprising strings of multiple epitopes) (Dyall et 
al, 1995; Thomson et al y 1996; Thomson et aL, 1998; Thomson et a/., 1998). These 
approaches however will always be sub optimal not only because of the slow pace of 
epitope characterisation but also, because it is virtually impossible for them to cover every 
existing HLA polymorphism in the population. A number of strategies have sought to 

30 avoid both problems by not identifying epitopes and instead incorporating larger amounts 
of sequence information e.g. 9 approaches using whole genes or proteins and approaches 
that mix multiple protein or gene sequences together. The proteins used by these strategies 
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however sometimes still function and therefore can compromise vaccine safety e.g., whole 
cancer proteins. Alternative strategies have tried to improve the safety of vaccines by 
fragmenting the genes and expressing them either separately or as complex mixtures e.g., 
library DNA immunisation or by ligating such fragments back together. These approaches 
5 are still sub-optimal because they are too complex, generate poor levels of immunity, 
cannot guarantee that all proteins no longer function and/or that all fragments are present, 
which compromises substantially complete immunological coverage. 

The lack of a safe and efficient vaccine strategy that can provide substantially 
complete immunological coverage is an important problem, especially when trying to 

10 develop vaccines against rapidly mutating and persistent viruses such as HIV and hepatitis 
C virus, because partial population coverage could allow vaccine-resistant pathogens to re- 
emerge in the future. Human immunodeficiency virus (HIV) is an RNA lenti virus virus 
approximately 9 kb in length, which infects CD4 + T cells, causing T cell decline and AIDS 
typically 3-8 years after infection. It is currently the most serious human viral infection, 

15 evidenced by the number of people currently infected with HIV or who have died from 
AIDS, estimated by the World Health Organisation (WHO) and UNAEDS in their AIDS 
epidemic update (December 1999) to be 33.6 and 16.3 million people, respectively. The 
spread of HIV is also now increasing fastest in areas of the world where over half of the 
human population reside, hence an effective vaccine is desperately needed to curb the 

20 spread of this epidemic. Despite the urgency, an effective vaccine for HTV is still some 
way off because of delays in defining the correlates of immune protection, lack of a 
suitable animal model, existence of up to 8 different subtypes of HIV and a high HIV 
mutation rate. 

A significant amount of research has been carried out to try and develop a vaccine 
25 capable of generating neutralising antibody responses that can protect against field isolates 
of HIV. Despite these efforts, it is now clear that the variability, instability and 
inaccessibility of critical determinants on the HIV envelope protein will make it extremely 
difficult and perhaps impossible to develop such a vaccine (Kwong et a/., 1998). The 
limited ability of antibodies to block HIV infection is also supported by the observation 
30 that development of ADDS correlates primarily with a reduction in CTL responsiveness to 
HIV and not to altered antibody levels (Ogg et al, 1998). Hence CTL-mediated and not 
antibody-mediated responses appear to be critical for maintaining the asymptomatic state 
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in vivo. There is also some evidence to suggest that pre-existing HIV-specific CTL 
responses can block the establishment of a latent HIV infection. This evidence comes from 
a number of cases where individuals have generated HIV-specific CTL responses without 
becoming infected and appear to be protected from establishing latent HIV infections 
5 despite repeated virus exposure (Rowland-Jones et aL, 1995; Panniani 1998). Taken 
together, these observations suggest that a vaccine capable of generating a broad range of 
strong CTL responses may be able to stop individuals from becoming latently infected 
with HIV or at least allcw infected individuals to remain asymptomatic for life. Virtually 
all of the candidate HIV vaccines developed to date have been derived from subtype B 

10 HIV proteins (western world subtype) whereas the majority of the HTV infections 
worldwide are caused by subtypes A/E or C (E and A are similar except in the envelop 
protein)(referred to as developing world subtypes). Hence existing candidate vaccines may 
not be suitable for the more common HIV subtypes. Recently, there has been some 
evidence that B subtype vaccines may be partially effective against other common HTV 

15 subtypes (Rowland-Jones et ah y 1998). Accordingly, the desirability of a vaccine still 
remains, whose effectiveness is substantially complete against all isolates of all strains of 
HIV. 



1 
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SUMMARY OF THE INVENTION 

The present invention is predicated in part on a novel strategy for enhancing the 
efficacy of an immunopotentiating composition. This strategy involves utilising the 
sequence information of a parent polypeptide to produce a synthetic polypeptide that 
5 comprises a plurality of different segments of the parent polypeptide, which are linked 
sequentially together in a different arrangement relative to that of the parent polypeptide. 
As a result of this change in relationship, the sequence of the linked segments in the 
synthetic polypeptide is different to a sequence contained within the parent polypeptide. As 
more fully described hereinafter, the present strategy is used advantageously to cause 
10 significant disruption to the structure and/or function of the parent polypeptide while 
minimising the destruction of potentially useful epitopes encoded by the parent 
polypeptide. 

Thus, in one aspect of the present invention, there is provided a synthetic 
polypeptide comprising a plurality of different segments of at least one parent polypeptide, 
15 wherein the segments are linked together in a different relationship relative to their linkage 
in the at least one parent polypeptide. 

In one embodiment, the synthetic polypeptide consists essentially of different 
segments of a single parent polypeptide. 

In an alternate embodiment, the synthetic polypeptide consists essentially of 
20 different segments of a plurality of different parent polypeptides. 

Suitably, said segments in said synthetic polypeptide are linked sequentially in a 
different order or arrangement relative to that of corresponding segments in said at least 
one parent polypeptide. 

Preferably, at least one of said segments comprises partial sequence identity or 
25 homology to one or more other said segments. The sequence identity or homology is 
preferably contained at one or both ends of said at least one segment. 

In another aspect, the invention resides in a synthetic polynucleotide encoding the 
synthetic polypeptide as broadly described above. 
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According to yet another aspect, the invention contemplates a synthetic construct 
comprising a said polynucleotide as broadly described above that is operably linked to a 
regulatory polynucleotide. 

In a further aspect of the invention, there is provided a method for producing a 
5 synthetic polynucleotide as broadly described above, comprising: 

- linking together in the same reading frame a plurality of nucleic acid sequences 
encoding different segments of at least one parent polypeptide to form a synthetic 
polynucleotide whose sequence encodes said segments linked together in a different 
relationship relative to their linkage in the at least one parent polypeptide. 

10 Preferably, the method further comprises fragmenting the sequence of a respective 

parent polypeptide into fragments and linking said fragments together in a different 
relationship relative to their linkage in said parent polypeptide sequence. In a preferred 
embodiment of this type, the fragments are randomly linked together. 

Suitably, the method further comprises reverse translating the sequence of a 
15 respective parent polypeptide or a segment thereof to provide a nucleic acid sequence 
encoding said parent polypeptide or said segment. In a preferred embodiment of this type, 
an amino acid of said parent polypeptide sequence is reverse translated to provide a codon, 
which has higher translational efficiency than other synonymous codons in a cell of 
interest. Suitably, an amino acid of said parent polypeptide sequence is reverse translated 
20 to provide a codon which, in the context of adjacent or local sequence elements, has a 
lower propensity of forming an undesirable sequence (e.g., a palindromic sequence or a 
duplicated sequence) that is refractory to the execution of a task cloning or 

sequencing). 

In another aspect, the invention encompasses a computer program product for 
25 designing the sequence of a synthetic polypeptide as broadly described above, comprising: 

- code that receives as input the sequence of at least one parent polypeptide; 

- code that fragments the sequence of a respective parent polypeptide into 
fragments; 
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- code that links together said fragments in a different relationship relative to their 
linkage in said parent polypeptide sequence; and 

- a computer readable medium that stores the codes. 

In yet another aspect, the invention provides a computer program product for 
5 designing the sequence of a synthetic polynucleotide as broadly described above, 
comprising: 

- code that receives as input the sequence of at least one parent polypeptide; 

- code that fragments the sequence of a respective parent polypeptide into 
fragments; 

10 - code that reverse translates the sequence of a respective fragment to provide a 

nucleic acid sequence encoding said fragment; 

- code that links together in the same reading frame each said nucleic acid 
sequence to provide a polynucleotide sequence that codes for a polypeptide sequence in 
which said fragments are linked together in a different relationship relative to their 

1 5 linkage in the at least one parent polypeptide sequence; and 

- a computer readable medium that stores the codes. 

In still yet another aspect, the invention provides a computer for designing the 
sequence of a synthetic polypeptide as broadly described above, wherein said computer 
comprises: 

20 (a) a machine-readable data storage medium comprising a data storage material 

encoded with machine-readable data, wherein said machine-readable data comprise the 
sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

25 (c) a central-processing unit coupled to said working memory and to said machine- 

readable data storage medium, for processing said machine readable data to provide said 
synthetic polypeptide sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polypeptide sequence. 
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In a preferred embodiment, the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments and 
linking together said fragments in a different relationship relative to their linkage in the 
sequence of said parent polypeptide. 

5 In still yet another aspect, the invention resides in a computer for designing the 

sequence of a synthetic polynucleotide as broadly described above, wherein said computer 
comprises: 

(a) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said machine-readable data comprise the 

1 0 sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
d ata; 

(c) a central-processing unit coupled to said working memory and to said machine- 
readable data storage medium, for processing said machine readable data to provide said 

1 5 synthetic polynucleotide sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polynucleotide sequence. 

In a preferred embodiment, the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments, 
20 reverse translating the sequence of a respective fragment to provide a nucleic acid 
sequence encoding said fragment and linking together in the same reading frame each said 
nucleic acid sequence to provide a polynucleotide sequence that codes for a polypeptide 
sequence in which said fragments are linked togetber^ a different relationship relative to 
their linkage in the at least one parent polypeptide sequence. 

25 According to another aspect, the invention contemplates a composition, 

comprising an immunopotentiating agent selected from the group consisting of a synthetic 
polypeptide as broadly described above, a synthetic polynucleotide as broadly described 
above and a synthetic construct as broadly described above, together with a 
pharmaceutical^ acceptable carrier. 
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The composition may optionally comprise an adjuvant. 

In a further aspect, the invention encompasses a method for modulating an 
immune response, which response is preferably directed against a pathogen or a cancer, 
comprising administering to a patient in need of such treatment an effective amount of an 
5 immunopotentiating agent selected from the group consisting of a synthetic polypeptide as 
broadly described above, a synthetic polynucleotide as broadly described above and a 
synthetic construct as broadly described above, or a composition as broadly descril^u 
above. 

According to still a further aspect of the invention, there is provided a method for 
10 treatment and/or prophylaxis of a disease or condition, comprising administering to a 
patient in need of such treatment an effective amount of an immunopotentiating agent 
selected from the group consisting of a synthetic polypeptide as broadly described above, a 
synthetic polynucleotide as broadly described above and a synthetic construct as broadly 
described above, or a composition as broadly described above. 

15 The invention also encompasses the use of the synthetic polypeptide, the synthetic 

polynucleotide and the synthetic construct as broadly described above in the study, and 
modulation of immune responses. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a diagrammatic representation showing the number of people living 
with AIDS in 1998 in various parts of the world and most prevalent HIV clades in these 
regions. Estimates generated by UN AIDS. 

5 Figure 2 is a graphical representation showing trends in the incidence of the 

common HIV clades and estimates for the future. Graph from the International Aids 
Vaccine Initiative (LA VI). 

Figure 3 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV gag [SEQ ID NO: 1] used for the construction of an 
10 embodiment of an HIV Savine. Also shown ar~ the alignments of common HIV clade 
consensus sequences for the HIV gag protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

15 Figure 4 is a diagrammatic representation showing overlapping segments of a 

parent polypeptide sequence for HIV pol [SEQ ID NO: 2] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV pol protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 

20 Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR98-485. 

Figure 5 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HTV vif [SEQ ID NO: 3] used for the construction of an 
embodiment of an HTV Savine. Also shown are the alignments of common HIV clade 
25 consensus sequences for the HTV vif protein from the HTV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR98-485. 



WO 01/090197 



PCT/AU01/00622 



-11- 

Figure 6 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV vpr [SEQ ID NO: 4] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV cladc 
consensus sequences for the HIV vpr protein from the HTV Molecular Immunology 
5 Database 1997, Editors Bette Koiber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 7 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV tat [SEQ ID NO: 5] used for the construction of an 
10 embodiment of an HTV Savine. Also shown are the alignments of common HTV clade 
consensus sequences for the HIV tat protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Koxber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

15 Figure 8 is a diagrammatic representation showing overlapping segments of a 

parent polypeptide sequence for HIV rev [SEQ ID NO: 6] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HTV clade 
consensus sequences for the HIV rev protein from the HTV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 

20 Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 9 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV vpu [SEQ ID NO: 7] used for the construction of an 
embodiment of an HTV Savine. Also shown are the alignments of common HIV clade 
25 consensus sequences for the HTV vpu protein from the HTV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 10 is a diagrammatic representation showing overlapping segments of a 
30 parent polypeptide sequence for HIV env [SEQ ID NO: 8] used for the construction of an 
embodiment of an HTV Savine. Also shown are the alignments of common HIV clade 
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consensus sequences for the HIV env protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

5 Figure 1 1 is a diagrammatic representation showing overlapping segments of a 

parent polypeptide sequence for HIV nef [SEQ ID NO: 9] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HTV clade 
consensus sequences for the HIV nef protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Koiber, John Moore, Cristian Brander, Richard Koup, Barton 
10 Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 12 is a diagrammatic representation depicting the systematic segmentation 
of the designed degenerate consensus sequences for each HTV protein and the reverse 
translation of each segment into a DNA sequence. Also shown is the number of segments 

IS used during random rearrangement and amino acids that wore removed. Amino acids 
surrounded by an open square were removed from the design, because degenerate codons 
to cater for the desired amino acid combination required too many degenerate bases to 
comply with the incorporation of degenerate sequence rules outlined in the description of 
the invention herein. Amino acids surrounded by an open circle were removed only in the 

20 segment concerned mainly because they were coded for in an oligonucleotide overlap 
region. Amino acids marked with an asterisk were designed differently in one fragment 
compared to the corresponding overlap region (see tat gene) 

Figure 13 is a diagrammatic representation showing the first and second most 
frequently used codons in mammals used to reverse translate HTV protein segments. Also 
25 shown are all first and second most frequently used degenerate codons for two amino acids 
where only one base is varied. Codons used where more than one base was varied were 
worked out in each case by comparing all the codons for each amino acid. The IUPAC 
codes for degenerate bases are also shown. 

Figure 14 illustrates the construction plan for the HIV Savine showing the 
30 approximate sizes of the subcassettes, cassettes and full-length Savine cDNA and the 
restriction sites involved in joining than together. Also shown are the extra sequences 
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added onto each subcassette during their design and a brief description of how the 
subcassettes, cassettes and full length cDNA were constructed and transferred into 
appropriate DNA plasmids. Description of full length construction: pA was cleaved with 
XhoVSaK and cloned into Xhol arms of the B cassette; pAB was cleaved with Xhol and 
5 cloned into Xhol arms of the C cassette; full length construct is excisable with either 
XbaVBamHI at the 5* end or BglH at the 3* end. Options for excising cassettes: A) 
XbaVBamHL at the 5' end, BglWXhol at the 3' end; B) XbaVBamHI at the 5* end, 
BglU/Sall at the 3' end; C) XbaVBamHI at the 5' end, BgHUSall at the 3' end. Cleaving 
plasmid vectors: pDNAVacc is cleavable with XbaVXhol (DNA vaccination); pBCB07 or 
10 pTK7.5 vectors are cleavable with BamHVSall (Recombinant Vaccinia); pAvipox vector 
pAF09 is cleavable with BamHVSall (Recombinant Avipox). 

Figure 15 shows the full length DNA (17253 bp) and protein sequence (5742 aas) 
of the HIV Savine construct. Fragment boundaries are shown, together with the position of 
each fragment in each designed HIV protein, fragment number (in brackets), spacer 

15 residues (two alanine residues) and which fragment the spacer was for (open boxes and 
arrows). The location of residual restriction site joining sequences corresponding to 
subcassette or cassette boundaries (shaded boxes) are also shown, along with start and stop 
codons, Kozak sequence, the location of the murine influenza virus CTL epitope sequence 
(near the 3* aid), important restriction sites at each end and the position of each degenerate 

20 amino acid (indicated by l X l ). 

Figure 16 depicts the layout and position of oligonucleotides in the designed DNA 
sequence for subcassette Al. The sequences which anneal to the short amplification 
oligonucleotides are indicated by hatched boxes and the position of oligonucleotide 
overlap regions are dark shaded 

25 Figure 17: Panel (a) depicts the stepwise asymmetric PCR of the two halves of 

subcassette Al (lanes 2-5 and 7-9, respectively) and final splicing together by SOEing 
(lane 10). DNA standards in lane 1 are pUC18 digested with Sau3AI. Panel (b) shows the 
stepwise ligation-mediated joining and PCR amplification of each cassette as indicated. 
DNA standards in lane 1 are SPP1 cut with EcoRl. 

30 Figure 18: Panel (a) shows summary of the construction of the DNA vaccine 

plasmids that express one HIV Savine cassette. Panel (b) shows a summary of the 



WO 01/090197 



PCT/AU01/00622 



-14- 

construction of the plasmids used for marker rescue recombination to generate Vaccinia 
viruses expressing one HIV Savine cassette. Panel (c) shows a summary of the 
construction of the DNA vaccine plasmids which each express a version of the full-length 
HIV Savine cDNA 

5 Figure 19 shows restimulation of HIV specific polyclonal CTL responses from 

three HIV-infected patients by the HIV Savine constructs. PBMCs from three different 
patients were restimulated for 7 days by infection with Vacci .ih \ rus pools expressing the 
HIV Savine cassettes: Pool 1 included W-AC1 and W-BC1; Pool 2 included W-AC2, 
W-BC2 andW-CC2. The restimulated PBMCs were then mixed with autologous LCLs 
10 (effector to target ratio of 50:1), which were either uninfected or infected with either 
Vaccinia viruses expressing the HIV proteins gag (W-gag), env (W-env) or pol (W- 
pol), W- HIV Savine pools 1 (light bars) or 2 (dark bars) or a control Vaccinia virus (W- 
Lac) and the amount of 51 Cr released used to determine percent specific lysis. K562 cells 
were used to determine the level of NK cell-mediated killing in their stimulated culture. 

15 Figure 20 is a diagrammatic representation showing CD4+ proliferation of 

PBMCs from HIV-1 infected patients restimulated with either Pooll or Pool2 of the HIV-1 
Savine. Briefly PBMCs were stained with CFSE and culture for 6 days with or without 
Ws encoding either pooll or pool2 of the HIV-1 Savine. Restimulated Cells were then 
labelled with antibodies and analysed by FACS. 

20 Figure 21 is a graphical representation showing the CTL response in mice 

vaccinated with the HIV Savine. C57BL6 mice were immunised with the HIV-1 Savine 
DNA vaccine comprising the six plasmids described in Figure 18a (100 fig total DNA was 
given as 50 ftg/leg i.m.). One week later Poxviruses (lxlO 7 pfu) comprising Pool 1 of the 
HTV-1 Savine were used to boost the immune responses. Three weeks later splenocytes 

25 from these mice were restimulated with W-Pool 1 or W-Pool 2 for 5 days and the 
resultant effectors used in a 5l Cr release cytotoxicity assay against targets infected with 
CTRW, W-pools or W expressing the natural antigens from HIV-1. 

Figure 22 shows immune responses of HIV Immune Macaques (vaccinated with 
recombinant FPV expressing gag-pol and challenged with HTV-1 2 years prior to 
30 experiment). Monkeys 1 and 2 were immunised once at day 0 with W Savine pool 1 
(Three Ws which together express the entire HIV Savine ). Monkey 3 was immunised 
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twice with FPV-gag-pol i.e., Day 0 is 3 weeks after first FPV-gag-pol immunisation. A) 
IFN-y detection by ELISPOT of whole blood (0.5 mL, venous blood heparin- 
anticoagulated) stimulated with AldrithioI-2 inactivated whole HIV-1 (20 hours, 20 
fig/mL). Plasma samples were then centrifuged (lOOOxg) and assayed in duplicate for 
5 antigen-specific IFN using capture ELISA. B) Flow cytometric detection of HIV-1 specific 
CD69+/CD8+ T cells. Freshly isolated PBMCs were stimulated with inactivated HIV-1 as 
above for 16 hours, washed and labelled with the antibodies. Cells were then analysed 
using a FACScalibur™ flow cytometer and data, analysed using Cell-Quest software. C) 
Flow cytometric detection of HIV-1 specific CD69+/CD4+ T cells carried out as in B). 

10 Figure 23 shows a diagram of a system used to carry out the instructions encoded 

by the storage medium of Figures 28 and 29. 

Figure 24 depicts a flow diagram showing an embodiment of a method for 
designing synthetic polynucleotide and synthetic polypeptides of the invention. 

Figure 25 shows an algorithm, which inter alia utilises the steps of the method 
15 shown in Figure 24. 

Figure 26 shows an example of applying the algorithm of Figure 25 to an input 
consensus polyprotein sequence of Hepatitis C la to execute the segmentation of the 
polyprotein sequence, the rearrangement of the segments, the linkage of the rearranged 
segments and the outputting of synthetic polynucleotide and polypeptide sequences for the 
20 preparation of Savines for treating and/or preventing Hepatitis C infection. 

Figure 27 illustrates an example of applying the algorithm of Figure 25 to input 
consensus melanocyte differentiation antigens (gplOO, MART, TRP-1, Tyros, Trp-2, 
MC1R, MUC1F and MUC1R) and to consensus melanoma specific antigens (RAGE, 
GAGE-1, gpl00In4, MAGE-1, MAGE-3, PRAME, TRP2IN2, NYNSOla, NYNSOlb and 
25 LAGE1) to facilitate segmentation of those sequences, to rearrange the segments, to link 
the rearranged segments and to synthetic polynucleotide and polypeptide sequences for the 
preparation of Savines for treating and/or preventing melanoma. 

Figure 28 shows a cross section of a magnetic storage medium. 

Figure 29 shows a cross section of an optically readable data storage medium. 
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Figure 30 shows six HIV Savine cassette sequences (Al [SEQ ID NO: 393], A2 
[SEQ ID NO: 399], B1[SEQ ID NO: 395], B2 [SEQ ID NO: 401], CI [SEQ ID NO: 397] 
and C2 [SEQ ID NO: 403]). Al, Bl and CI can be joined together using, for example, 
convenient restriction enzyme sites provided at the ends of each cassette to construct an 
5 embodiment of a fall length HIV Savine [SEQ ID NO: 405]. A2, B2 and C2 can also be 
joined together to provide another embodiment of a fall length HTV Savine with 350 aa 
mutations common in major HIV clades. The cassettes A/B/C can be joined into single 
constructs using specific restriction enzyme sites incorporated after the start codon or 
before the stop codon in the cassettes 
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BRIEF DESCRIPTION OF THE SEQUENCES: SUMMARY TABLE 
TABLE A 




OE^ JLLf INW. 1 


o/vvj consensus poiypepuae 


4yy aa 


bliQ ID WU. Z 


POL consensus polypeptide 


995 aa 


oiiKl ID NU. 3 


VIF consensus polypeptide 


192 aa 


MlvJ iU NU: 4 


VPR consensus polypeptide 


96 aa 


oliy JJJ NU: j 


TAT consensus polypeptide 


1UZ aa 


oHy ULI NU. o 


REV consensus polypeptide 


123 aa 


AJJ NU- / 


vru consensus poiypepuae 


ol aa 


Ct?r^ TTfc XT/*V ft 


UN v consensus poiypepuae 


o_>l aa 




iNcr consensus poiypepuae 


zuo aa 




GAfr QPffTTIMlt 1 
VJ/\VJ oC^IJlClll 1 


nis 


SEO ID NO- 1 1 


Polvnentide encoded hv 55FO ID NO* 1 0 




SEO ID NO* 12 


GAG qecrment 2 


00 nt<s 

7u 11 to 


SEO ID NO: 13 


Polvoentide encoded bv SEO ID NO: 12 


30 aa 


SEQ ID NO: 14 


GAG segment 3 


90 nts 


SEQIDNO: 15 


Polypeptide encoded by SEQ ID NO: 14 


30 aa 


SEQ ID NO: 16 


GAG segment 4 


90 nts 


SEQIDNO: 17 


Polypeptide encoded by SEQ ID NO: 16 


30 aa 


SEQIDNO: 18 


GAG segment 5 


90 nts 


SEQ ID NO: 19 


Polypeptide encoded by SEQ ID NO: 1 8 


30 aa 


SEQ ID NO: 20 


GAG segment 6 


90 nts 


SEQ ID NO: 21 


Polypeptide encoded by SEQ ID NO: 20 


30 aa 


SEQ ID NO: 22 j 


GAG segment 7 


90 nts 
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SEQIDNO: 23 


Polypeptide encoded by SEQ ID NO: 22 


30 aa 


SEQ ID NO: 24 


GAG segment 8 


90nts 


SEQIDNO: 25 


Polypeptide encoded by SEQ ID NO: 24 


30 aa 


SEQ ID NO: 26 


GAG segment 9 


90nts 


SEQ ID NO: 27 


Polypeptide encoded by SEQ ID NO: 26 


30 aa 


SEQ ID NO: 28 


GAG segment 10 


90nts 


SEQ ID NO: 29 


Polypeptide encoded by SEQ ID NO: 28 


30 aa 


SEQIDNO: 30 


GAG segment 1 1 


90nts 


SEQ ID NO: 31 


Polypeptide encoded by SEQ ID NO: 30 


30 aa 


SEQ ID NO: 32 


GAG segment 12 


90nts 


SEQIDNO: 33 


Polypeptide encoded by SEQ ID NO: 32 


30 aa 


SEQ ID NO: 34 


GAG segment 13 


90 nts 


SEQ ID NO: 35 


Polypeptide encoded by SEQ ID NO: 34 


30 aa 


SEQIDNO: 36 


GAG segment 14 


90 nts 


SEQ ID NO: 37 


Polypeptide encoded by SEQ ID NO: 36 


30 aa 


SEQIDNO: 38 


GAG segment 15 


90 nts 


SEQIDNO: 39 


Polypeptide encoded by SEQ ID NO: 38 


30 aa 


SEQ ID NO: 40 


GAG segment 16 


90 nts 


SEQ ID NO: 41 


Polypeptide encoded by SEQ ID NO: 40 


30 aa 


SEQ ID NO: 42 


GAG segment 17 


90 nts 


SEQIDNO: 43 


Polypeptide encoded by SEQ ID NO: 42 


30 aa 


SEQ ID NO: 44 


GAG segment 18 


90 nts 


SEQIDNO: 45 


Polypeptide encoded by SEQ ID NO: 44 


30 aa 


SEQ ID NO: 46 


GAG segment 19 


90 nts 
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SEQ ID NO: 47 


Polypeptide encoded by SEQ ID NO: 46 


30 aa 


SEQ ID NO: 48 


GAG segment 20 


90 nts 


SEQIDNO: 49 


Polypeptide encoded by SEQ ID NO: 48 


30 aa 


SEQ ID NO: 50 


GAG segment 21 


90 nts 


SEQ ID NO: 51 


Polypeptide encoded by SEQ ID NO: 50 


30 aa 


SEQIDNO: 52 


GAG segment 22 


90 nts 


SEQIDNO: 53 


Polypeptide encoded by SEQ ID NO: 52 


30 aa 


SEQIDNO: 54 


GAG segment 23 


90 nts 


SEQ ID NO: 55 


Polypeptide encoded by SEQ ID NO: 54 


30 aa 


SEQIDNO: 56 


GAG segment 24 


90 nts 


SEQIDNO: 57 


Polypeptide encoded by SEQ ID NO: 56 


30 aa 


SEQIDNO: 58 


GAG segment 25 


90 nts 


SEQ ID NO: 59 


Polypeptide encoded by SEQ ID NO: 58 


30 aa 


SEQ ID NO: 60 


GAG segment 26 


90 nts 


SEQ ID NO: 61 


Polypeptide encoded by SEQ ID NO: 60 


30 aa 


SEQ ID NO: 62 


GAG segment 27 


90 nts 


SEQ ID NO: 63 


Polypeptide encoded by SEQ ID NO: 62 


30 aa 


SEQ ID NO: 64 


GAG segment 28 


90 nts 


SEQ ID NO: 65 


T> n l... „ t * 4 „ 1 1 1 . £1T?/"V TT\ \T/\ f A 

Polypeptide encoded by SEQ ID NO: 64 


30 aa 


SEQIDNO: 66 


GAG segment 29 


90 nts 


SEQIDNO: 67 


Polypeptide encoded by SEQ ID NO: 66 


30 aa 


SEQIDNO: 68 


GAG segment 30 


90 nts 


SEQIDNO: 69 


Polypeptide encoded by SEQ ID NO: 68 


30 aa 


SEQIDNO: 70 


GAG segment 31 


90 nts 
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SEQK>NO:71 


Polypeptide encoded by SEQ ID NO: 70 


30 aa 


SEQ ID NO: 72 


GAG segment 32 


90nts 


SEQ ID NO: 73 


Polypeptide encoded by SEQ ID NO: 72 


30 aa 


SEQ ID NO: 74 


GAG segment 33 


57nts 


SEQ ID NO: 75 


Polypeptide encoded by SEQ ID NO: 74 


19 aa 


SEQ ID NO: 76 


POL segment 1 


90nts 


SEQ ID NO: 77 


Polypeptide encoded by SEQ ID NO: 76 


30 aa 


SEQ ID NO: 78 


POL segment 2 


90nts 


SEQ ID NO: 79 


Polypeptide encoded by SEQ ID NO: 78 


30 aa 


SEQ ID NO: 80 


POL segment 3 


90nts 


SEQ ID NO: 81 


Polypeptide encoded by SEQ ID NO: 80 


30 aa 


SEQ ID NO: 82 


POL segment 4 


90nts 


SEQ ED NO: 83 


Polypeptide encoded by SEQ ID NO: 82 


30 aa 


SEQ ID NO: 84 


POL segment 5 


90nts 


SEQ ID NO: 85 


Polypeptide encoded by SEQ ID NO: 84 


30 aa 


SEQ ID NO: 86 


POL segment 6 


90nts 


SEQ ID NO: 87 


Polypeptide encoded by SEQ ID NO: 86 


30 aa 


SEQ ID NO: 88 


POL segment 7 


90nts 


onvj id inu: oy 


_ 1 a.? -1 J_ J 1_ _ „ pT?/\ TT"V \tA ft f> 

Polypeptide encoded by SEQ ID NO: 88 


30 aa 


SEQ ID NO: 90 


POL segment 8 


90nts 


SEQ ID NO: 91 


Polypeptide encoded by SEQ ID NO: 90 


30 aa 


SEQ ID NO: 92 


POL segment 9 


90nts 


SEQ ID NO: 93 


Polypeptide encoded by SEQ ID NO: 92 


30 aa 


SEQ ID NO: 94 


POL segment 10 


90nts 
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SEQIDNO: 95 


Polypeptide encoded by SEQ ID NO: 94 


30 aa 


SEQ ID NO: 96 


POL segment 1 1 


00 rite 


SEQ ID NO: 97 


Polvneotide encoded bv SEO ID NO- Qfi 


10 aa 
ju da 


SEO ID NO* 98 


POL seement 1 2 




SEO ID NO- 99 


Polvnenlide pnroded hv <\"FO TH NO' OR 


lO QO 

jU da 


SEO ID NO- 100 


POT. moment 11 


OO nto 

yu nis 


SEO ID NO- 101 


Polvnentide eneodfiH hv ^FH TH TvJfV 1 00 




SEO ID NO- 102 


POT Qpompnt Id 
x vyx-» oGglxxvxil x *t - 


OO nto 

yu nis 


SEO ED NO- 103 

OJL^V^f JUL/ A^iV/. 1 vj 


Pnl vnpntiHp wir/**/! aH Kv CPO TTi NJY"V 1 OO 
x uijrpcpuuc ciiuuucu uy oxiv^ ix/ iNv/» 


ju aa 


SEO ID NO- 104 


POT Qpom^nt 1 S 


OO 

yu nts 


SEO ID NO- 105 




ju aa 


SEO ID NO- 106 


POT Rpompnt Ifi 

XT V/jW &v*£lxl Vlil 1 U 


OO ntc* 

yu nis 


SEO ID NO- 107 


Polvnentide encoded hv SFO TD NO- 1 06 


TO aa 
JU aa 


SEO ID NO- 108 


POL segment 1 7 


00 ntc 
7U 11 Lb 


SEQ ID NO: 109 


Polvpeotide encoded bv SEO ID NO- 108 


j \j aa 


SEQIDNO: 110 


POL segment 18 


90 ntQ 


SEQIDNO: 111 


Polypeptide encoded by SEQ ID NO: 110 


30 aa 

v/ aa 


SEQIDNO: 112 


POL segment 19 


90 nts 


SEQIDNO: 113 


Polypeptide encoded by SEQ ED NO: 1 12 


30 aa 


SEQIDNO: 114 


POL segment 20 


90 nts 


SEQIDNO: 115 


Polypeptide encoded by SEQ ID NO: 114 


30 aa 


SEQIDNO: 116 


POL segment 21 


90 nts 


SEQIDNO: 117 


Polypeptide encoded by SEQ ID NO: 1 16 


30 aa 


SEQIDNO: 118 


POL segment 22 


90 nts 



WO 01/090197 



PCT/AU01/00622 



-22- 







i. 


ofa rp\ XT/'V lift 

obQ ID NU: 119 


Polypeptide encoded by obQ ID NU: Ho 


30 aa 


ofcQ ID NU: 120 


r UL segment 23 


90nts 


SEQ ID NO: 121 


Polypeptide encoded by SEQ ID NO: 120 


30 aa 


SEQ ID NO: 122 


POL segment 24 


90 nts 


SEQ ID NO: 123 


Polypeptide encoded by SEQ ID NO: 122 


30 aa 


SEQ ID NO: 124 


POL segment 25 


90 nts 


SEQ ID NO: 125 


Polypeptide encoded by SEQ ID NO: 124 


30 aa 


SEQ ID NO: 126 


POL segment 26 


90 nts 


SEQ ID NO: 127 


Polypeptide encoded by SEQ ID NO: 126 


30 aa 


SEQ ID NO: 128 


POL segment 27 


90 nts 


SEQ ID NO: 129 


Polypeptide encoded by SEQ ID NO: 128 


30 aa 


SEQ ID NO: 130 


POL segment 28 


90 nts 


SEQ ED NO: 131 


Polypeptide encoded by SEQ ID NO: 130 


30 aa 


SEQ ID NO: 132 


POL segment 29 


90 nts 


OT"?/"X TT^ VTA « 1 «■« 

SEQ ID NO: 133 


Polypeptide encoded by SEQ ID NO: 132 


30 aa 


pt?A TT\ VTA. Hi 

SEQ ID NO: 1 34 


1\f\T ^ /v 

POL segment 30 


90 nts 


SEQ ID NO: 135 


Polypeptide encoded by SEQ ED NO: 134 


30 aa 


PtjA tt\ XT A. 1 in 

bJEQ ID NO: 136 


rUL segment 3 1 


90 nts 


TH XJ/V 1 T7 
OX!fV^ Hi INU.. O / 


roiypepuae encoaea oy ony il/ in kj. i jo 


aa 


SEQ ID NO: 138 


POL segment 32 


90 nts 


SEQ ED NO: 139 


Polypeptide encoded by SEQ ID NO: 138 


30 aa 


SEQ ID NO: 140 


POL segment 33 


90 nts 


SEQ ID NO: 141 


Polypeptide encoded by SEQ ID NO: 140 


30 aa 


SEQ ID NO: 142 


POL segment 34 j 


90 nts 
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i i 
I 


SEQ ED NO: 143 


Polypeptide encoded by SEQ ID NO: 142 


30 aa 


SEQ ID NO: 144 


POL segment 35 


90nts 


SEQ ID NO: 145 


Polypeptide encoded by SEQ ID NO: 144 


30 aa 


SEQ ID NO: 146 


POL segment 36 


90 nts 


SEQ ID NO: 147 


Polypeptide encoded by SEQ ID NO: 146 


30 aa 


SEQ ID NO: 148 


POL segment 37 


90 nts 


SEQ ID NO: 149 


Polypeptide encoded by SEQ ID NO: 148 


30 aa 


SEQ ID NO: 150 


POL segment 38 


90 nts 


SEQ ID NO: 151 


Polypeptide encoded by SEQ ID NO: 150 


30 aa 


SEQ ID NO: 152 


POL segment 39 


90 nts 


SEQ ID NO: 153 


Polypeptide encoded by SEQ ID NO: 152 


30 aa 


SEQ ID NO: 154 


POL segment 40 


90 nts 


SEQ ID NO: 155 


Polypeptide encoded by SEQ ID NO: 154 


30 aa 


SEQ ID NO: 156 


POL segment 41 


90 nts 


SEQ ID NO: 157 


Polypeptide encoded by SEQ ID NO: 1 56 


30 aa 


SEQ ID NO: 158 


POL segment 42 


90 nts 


SEQ ID NO: 159 


Polypeptide encoded by SEQ ID NO: 158 


30 aa 


SEQ ID NO: 160 


POL segment 43 


90 nts 


SEQ ID NO: 161 


ft 1 A. m . -1 - 1 _ J « _ . OT7A TT*V VTA. 1 £f\ 

Polypeptide encoded by SEQ ID NO: 160 


30 aa 


SEQ ID NO: 162 


POL segment 44 


90 nts 


SEQ ID NO: 163 


Polypeptide encoded by SEQ ID NO: 162 


30 aa 


SEQ ID NO: 164 


POL segment 45 


90 nts 


SEQ ID NO: 165 


Polypeptide encoded by SEQ ID NO: 164 


30 aa 


SEQ ID NO: 166 


POL segment 46 


90 nts 
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- 1 _ ._ a* -1 _ _ . . . _ .1 i | Arj/> TT% \TA 1 

Polypeptide encoded by SEQ ID NO: 166 


30 aa 


SEQIDNO: 168 


POL segment 47 


90nts 


SEQIDNO: 169 


Polypeptide encoded by SEQ ID NO: 168 


30 aa 


flT^/'X tf\ VTA « *mf\ 

SEQIDNO: 170 


POL segment 48 


90nts 


SEQIDNO: 171 


Polypeptide encoded by SEQ ID NO: 170 


30 aa 


SEQ ID NO: 172 


POL segment 49 


90 nts 


SEQIDNO: 173 


Polypeptide encoded by SEQ ID NO: 172 


30 aa 


SEQIDNO: 174 


POL segment 50 


90 nts 


SEQIDNO: 175 


Polypeptide encoded by SEQ ID NO: 174 


30 aa 


SEQIDNO: 176 


POL segment 51 


90 nts 


SEQ ID NO: 177 


Polypeptide encoded by SEQ ID NO: 176 


30 aa 


SEQIDNO: 178 


POL segment 52 


90 nts 


SEQIDNO: 179 


Polypeptide encoded by SEQ ID NO: 178 


30 aa 


OPA YTV VTA 1 OA 

SEQIDNO: 180 


POL segment 53 


90 nts 


flTJA TT*V VT/N. |A1 

SEQIDNO: 181 


Polypeptide encoded by SEQ ID NO: 1 80 


30 aa 


CPA TT\ VT/"\. "I 

SEQ ID NO: 182 


POL segment 54 


90 nts 


obQ ID NU: 183 


Polypeptide encoded by SEQ ID NO: 1 82 


30 aa 


oii^ UJ IMU: 184 


FUL segment 55 


90 nts 




roiypepude encoded by ob^» • O NO: 1 84 


30 aa 


SEQIDNO: 186 


POL segment 56 


90 nts 


SEQ ID NO: 187 


Polypeptide encoded by SEQ ID NO: 1 86 


30 aa 


SEQIDNO: 188 


POL segment 57 


90 nts 


SEQIDNO: 189 


Polypeptide encoded by SEQ ID NO: 188 


30 aa 


SEQ ID NO: 190 


POL segment 58 


90 nts 
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OT)A 1T"V V Trf"V. "I ni 

SEQIDNO: 191 


Polypeptide encoded by SEQ ID NO: 190 


30 aa 


SEQIDNO: 192 


POL segment 59 


90nts 


SEQIDNO: 193 


Polypeptide encoded by SEQ ID NO: 192 


30 aa 


SEQ ID NO: 194 


POL segment 60 


90nts 


SEQ ID NO: 195 


Polypeptide encoded by SEQ ID NO: 194 


30 aa 


SEQ ED NO: 196 


POL segment 61 


90nts 


SEQ ID NO: 197 


Polypeptide encoded by SEQ ID NO: 196 


30 aa 


SEQ ID NO: 198 


POL segment 62 


90nts 


SEQIDNO: 199 


Polypeptide encoded by SEQ ID NO: 198 


30 aa 


SEQ ID NO: 200 


POL segment 63 


90nts 


SEQIDNO: 201 


Polypeptide encoded by SEQ ID NO: 200 


30 aa 


SEQ ID NO: 202 


POL segment 64 


90nts 


SEQ ID NO: 203 


Polypeptide encoded by SEQ ID NO: 202 


30 aa 


SEQ ID NO: 204 


POL segment 65 


90nts 


SEQ ID NO: 205 


Polypeptide encoded by SEQ ID NO: 204 


30 aa 


SEQ ID NO: 206 


POL segment 66 


60 nts 


SEQ ID NO: 207 


Polypeptide encoded by SEQ ID NO: 206 


20 aa 


SEQ ID NO: 208 


VIF segment 1 


90 nts 




roiypepuae encocea oy oc\i \±j nv. zkjo 


30 aa 


SEQ ID NO: 210 


VIF segment 2 


90 nts 


SEQIDNO: 211 


Polypeptide encoded by SEQ ID NO: 210 


30 aa 


SEQ ID NO: 212 


VIF segment 3 


90 nts 


SEQ ID NO: 213 


Polypeptide encoded by SEQ ID NO: 212 


30 aa 


SEQIDNO: 214 


VIF segment 4 


90 nts 
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ot\l ID NU. Z15 


Polypeptide encoded by SEQ ID NO: 214 


30 aa 


CtJA 1T\ XT/"*. O 1 C 

SEQIDNU: Zlo 


VIr segment 5 


90nts 


rtT?/\ TTV XT /A. An 

SEQIDNO: 217 


Polypeptide encoded by SEQ ID NO: 216 


30 aa 


SEQIDNO: 218 


VIF segment 6 


90nts 


SEQIDNO: 219 


Polypeptide encoded by SEQ ID NO: 218 


30 aa 


SEQ ID NO: 220 


VIF segment 7 


90 nts 


SEQ ID NO: 221 


Polypeptide encoded by SEQ ID NO: 220 


30 aa 


SEQ ID NO: 222 


VIF segment 8 


90 nts 


SEQ ID NO: 223 


Polypeptide encoded by SEQ ID NO: 222 


30 aa 


SEQ ID NO: 224 


VIF segment 9 


90 nts 


SEQ ID NO: 225 


Polypeptide encoded by SEQ ID NO: 224 


30 aa 


SEQ ID NO: 226 


VIF segment 10 


90 nts 


TTV VTA 

SEQ ID NO: 227 


Polypeptide encoded by SEQ ID NO: 226 


30 aa 


CVT> f~\ II v VTA *W O 

SEQ ID NO: 228 


VTF segment 1 1 


90 nts 


OTJA TP\ VTA, 

bEQ ID NU: 229 


TV— I— * * 1 . „ ] » fl flT^^V TT\ VTA ^*%*"» 

Polypeptide encoded by SEQ ID NO: 228 


30 aa 


CCA TTv XT/'V T3A 

aJbQ ID NO: 23U 


Vlr segment 12 


81 nts 


oca nr\ xt/"v 01 1 
5>fcQ ID NO: 231 


TV — 1 a* _1 _ _ J ^ J I . Ar?A TTV VTA *V1A\ 

Polypeptide encoded by SEQ ID NO: 230 


27 aa 


oxs r\ TT\ XT/V OIO 
oiSQ ID NU. ZiZ 


VrK segment 1 


90 nts 


SEO ID NO- 233 

OJU/V^J JUL/ iivy. £»JJ 


X Wljr|/ y|SUUG vIHA/UCU Ujr OXvV^ JUL' JLOjL 


30 aa 


SEQ ID NO: 234 


VPR segment 2 


90 nts 


SEQ ED NO: 235 


Polypeptide encoded by SEQ ID NO: 234 


30 aa 


SEQ ID NO: 236 


VPR segment 3 


90 nts 


SEQ ED NO: 237 


Polypeptide encoded by SEQ ID NO: 236 


30 aa 


SEQIDNO: 238 


VPR segment 4 


90 nts 
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SEQ ID NO: 239 


Polypeptide encoded bv SEO ID NO* 238 


30 aa 


SEO ID NO- 240 


VPR segment 5 


/V/ 11 lo 


SEO ID NO- 241 


PolvDentide encoded hv 9FO TD NO- 940 


1H aa 


SEO ID NO- 242 


VPR QAtrmpnt 

▼ X XX. 9V£1X1WUI i/ 


o j nis 


^IFO TD NO* 941 

OXZrV^ 11-/ liv. ^tj 


Pn1vn<*ntiHp ^nrrvHrv^ Kv QFO TD WD* 949 


Zl EE 


SIFO ID NO- 244 


TAT cpompnt 1 

1 AI OVgllldll 1 


yu nis 


<5FO TD NO* 945 


Pnlvnpntidp pnmHpH Viv QPD TD NO* 944 


jU EE 


^FO TD NO' 946 


TAT cpompnt 9 

1 A 1 aCglllCIll Zp 


OH ntn 


cpn TD NO* 947 


Pn1vni»nHHA r*nrr\Af*A Kv QCH IT) MO* 0/1 A 
x UiypcpiiUC cuCOQCu Djr oHl^ iXJ iNv/. Z*tO 


3U EE 


<?FO TT> NO* 948 


TAT cpom^nt 1 


OH nfr 

yu nts 


QFO TD NO' 940 


x uiypcpuuc cntAHicu uy z>ck£ li/ inu. z**o 


JU EE 


SEO ID NO- 250 


TAT c/>ompnt 4 

X A X d*&KXiXdll *T 


On nfr 

yu nis 


SEO ID NO- 251 


Polvnentide encoded hv <?FO TD NO* 9 SO 


in oo 

jU EE 


SEO ID NO- 252 


TAT segment 5 


OO ntc 

/V nis 


SEO ID NO- 253 


PolvoeDtide encoded bv SEO ID NO- 259 


in 39 

JU M 


SEO ID NO: 254 


TAT seement 6 


O 1 11 to 1 


SEQ ID NO: 255 


PolvDeotide encoded bv SEO ID NO- 254 


27 aa 


SEQ ID NO: 256 


REV segment 1 


90 ntc 

/V 11 LO 


SEQ ID NO: 257 


Polypeptide encoded by SEQ ID NO: 256 


30 aa 


SEQ ID NO: 258 


REV segment 2 


90 nts 


SEQ ID NO: 259 


Polypeptide encoded by SEQ ID NO: 258 


30 aa 


SEQ ID NO: 260 


REV segment 3 


90 nts 


SEQ ID NO: 261 


Polypeptide encoded by SEQ ID NO: 260 


30 aa 


SEQ ID NO: 262 


REV segment 4 


90 nts 
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^FO ID NO- 761 


xroiypepuue encoucu oy oEy ID NO. 202 


1 30 aa 


cpn m NO- 


Krs v segment d 


90nts 


oJDVj XNvJ. zod 


Polypeptide encoded by bbQ ID NO: 264 


30 aa 


ocyl ID NO: ZOO 


KbV segment 6 


90nts 


obQ ID NO: 267 


Polypeptide encoded by SEQ ID NO: 266 


30 aa 


abQ ID NO: 268 


REV segment 7 


90nts 


oEQ ID NO: 269 


Polypeptide encoded by SEQ ID NO: 268 


30 aa 


obQ ID NO: 270 


REV segment 8 


54nts 


obQ ID NO: 271 


Polypeptide encoded by SEQ ID NO: 270 


18 aa 


CTJA fT> VTA. Oil 

obQ ID NO: 272 


VPU segment 1 


90nts 


OT7/"\ TT"\ XT/A. I'M 

dby ID NO: 273 


Polypeptide encoded by SEQ ID NO: 272 


30 aa 


oby ID NO: 274 


VPU segment 2 


90 nts 


cpn in xtw ot< 
oJby ID NO. 2 / j 


Polypeptide encoded by SEQ ID NO: 274 ! 


30 aa 


CEA TT\ XT/X. 77/C 

oiiy ID NO. Z/0 


■\ 7TJT T nn r ii i ii ii < 

vru segment 3 


90 nts 




roiypepude encoded by obQ ID NO: 276 


30 aa 




V/T>T T CAftmonf Vl I 

vru segment 4 


AA * 

90 nts 


nFO Tn no* ^70 


x oiypepuoe encouea oy obv£ id NO: 27o 1 


"> A 

30 aa 




\/pT T cpmnATif ^ 

vru segment j i 


63 nts 


SFO ID NO- 281 


'nlvn^ntiHA pnrnHAH Vi\/ CT7/*^ TTfc XT/A* O O A if 

r uiypcpuae encooeu oy oJdv^ iD NO: ZoU 1 


21 aa 


SEQIDNO: 282 


ENV segment 1 I 


90 nts 


SEQIDNO: 283 


Polypeptide encoded by SEQ ID NO: 282 j 


30 aa 


SEQ ID NO: 284 


ENV segment 2 


90 nts 


SEQIDNO: 285 


Polypeptide encoded by SEQ ID NO: 284 


30 aa 


SEQIDNO: 286 


ENV segment 3 ( 


90 nts 
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oJbQ ID NU: 2o / 


rolypepude encoded by SbQ ID NU. 2oo 


30 aa 


afcQ ID NU: 2oo 


TTVTX T n ill r>« «■ il 

UNV segment 4 


90nts 


OTA TT\ \T/N. AAA 

SEQ ID NO: 289 


Polypeptide encoded by SEQ ID NO: 288 


30 aa 


SEQ ID NO: 290 


ENV segment 5 


90 nts 


SEQ ID NO: 291 


Polypeptide encoded by SEQ ID NO: 290 


30 aa 


SEQ ED NO: 292 


ENV segment 6 


90 nts 


SEQ ID NO: 293 


Polypeptide encoded by SEQ ID NO: 292 


30 aa 


SEQ ID NO: 294 


ENV segment 7 


90 nts 


SEQ ID NO: 295 


Polypeptide encoded by SEQ ID NO: 294 


30 aa 


SEQ ID NO: 296 


ENV segment 8 


90 nts 


SEQ ID NO: 297 


Polypeptide encoded by SEQ ID NO: 296 


30 aa 


SEQ ID NO: 298 


ENV segment 9 


57 nts 


SEQ ID NO: 299 


Polypeptide encoded by SEQ ID NO: 298 


19 aa 


SEQ ID NO: 300 


GAP A segment 1 


90 nts 


bbQ ID NO: 301 


_ t . _» _ J. J 1 OFA TT\ VTA. "> A A 

Polypeptide encoded by SEQ ID NO: 300 


30 aa 


CCA TTN XVV 1 AO 

bbQ UJ NU: 302 


OAr A segment 2 


90 nts 


oJbQ ID NU: 303 


Polypeptide encoded by bJEQ ID NU: 302 


O A 

30 aa 


CT70 TTfc XJfV ^flA 


vjAr a segment 5 


90 nts 


OJl>v^ 11-/ 1>U. JKJJ 


r oiypepuQe encoueu oy ocy ili invi. ju*i 


j\j aa 


SEQ ID NO: 306 


GAP A segment 4 


90 nts 


SEQ ID NO: 307 


Polypeptide encoded by SEQ ID NO: 306 


30 aa 


SEQ ID NO: 308 


GAP A segment 5 


90 nts 


SEQ ID NO: 309 


Polypeptide encoded by SEQ ID NO: 308 


30 aa 


SEQ ID NO: 310 


GAP A segment 6 


90 nts 
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OTJA ¥T"V VTA, Tt 1 

SEQIDNO: 31 1 


Polypeptide encoded by SEQ ID NO: 310 


30 aa 


SEQIDNO: 312 


GAP A segment 7 


75nts 


SEQIDNO: 313 


Polypeptide encoded by SEQ ID NO: 3 12 


25nts 


SEQIDNO: 314 


GAP B segment 1 


90nts 


SEQIDNO: 315 


Polypeptide encoded by SEQ ID NO: 314 


30 aa 


SEQIDNO: 316 


GAP B segment 2 


90nts 


SEQIDNO: 317 


Polypeptide encoded by SEQ ID NO: 316 


30 aa 


SEQIDNO: 318 


GAP B segment 3 


90nts 


SEQIDNO: 319 


Polypeptide encoded by SEQ ID NO: 318 


30 aa 


SEQIDNO: 320 


GAP B segment 4 


90 nts 


SEQIDNO: 321 


Polypeptide encoded by SEQ ID NO: 320 


30 aa 


SEQ ID NO: 322 


GAP B segment 5 


90 nts 


SEQ ID NO: 323 


Polypeptide encoded by SEQ ID NO: 322 


30 aa 


SEQ ID NO: 324 


GAP B segment 6 


90 nts 


SEQ ID NO: 325 


Polypeptide encoded by SEQ ID NO: 324 


30 aa 


SEQ ID NO: 326 


GAP B segment 7 


90 nts 


SEQ ID NO: 327 


Polypeptide encoded by SEQ ID NO: 326 


30 aa 


SEQ ID NO: 328 


GAP B segment 8 


90 nts 




roiypepuae encooea oy oe\i id inu. dLo 


3U aa 


SEQIDNO: 330 


GAP B segment 9 


90 nts 


SEQIDNO: 331 


Polypeptide encoded by SEQ ID NO: 330 


30 aa 


SEQIDNO: 332 


GAP B segment 10 


90 nts 


SEQIDNO: 333 


Polypeptide encoded by SEQ ID NO: 332 


30 aa 


SEQIDNO: 334 


GAP B segment 1 1 


90 nts 
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SEQ ID NO: 335 


TV _ 1 _ « * 1 _ _ , , _ J ^ J f pT»A TTV VTA 1"% A 

Polypeptide encoded by SEQ ID NO: 334 


30 aa 


OT7A TA VTA. "> 

SEQ ID NO: 336 


GAP B segment 12 


90nts 


SEQ ID NO: 337 


Polypeptide encoded by SEQ ID NO: 336 


30 aa 


SEQ ID NO: 338 


GAPB segment 13 


90 nts 


SEQ ID NO: 339 


Polypeptide encoded by SEQ ID NO: 338 


30 aa 


SEQ ID NO: 340 


GAP B segment 14 


90 nts 


SEQ ID NO: 341 


Polypeptide encoded by SEQ ID NO: 340 


30 aa 


SEQ ID NO: 342 


GAPB segment 15 


90 nts 


SEQ ID NO: 343 


Polypeptide encoded by SEQ ID NO: 342 


30 aa 


SEQ ID NO: 344 


GAPB segment 16 


90 nts 


SEQ ID NO: 345 


Polypeptide encoded by SEQ ID NO: 344 


30 aa 


SEQ ID NO: 346 


GAP B segment 17 


90 nts 


SEQ ID NO: 347 


Polypeptide encoded by SEQ ID NO: 346 


30 aa 


SEQ ID NO: 348 


GAPB segment 18 


90 nts 


SEQ ID NO: 349 


Polypeptide encoded by SEQ ID NO: 348 


30 aa 


SEQ ID NO: 350 


/~> a TV TV — a t /V 

GAP B segment 1 9 


90 nts 


OrA TTN VTA. 

SEQ ID NO: 351 


Polypeptide encoded by SEQ ID NO: 350 


30 aa 


OT7A TA VTA.. <5M 

SEQ ID NO: 352 


GAP B segment 20 


90 nts 




roiypepude encoded by any WO: ioz 


30 aa 


SEQIDNO: 354 


GAPB segment 21 


90 nts 


SEQIDNO: 355 


Polypeptide encoded by SEQ ID NO: 354 


30 aa 


SEQ ID NO: 356 


GAPB segment 22 


90 nts 


SEQIDNO: 357 


Polypeptide encoded by SEQ ID NO: 356 


30 aa 


SEQ ID NO: 358 


GAPB segment 23 


90 nts 
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SEO ID NO- 359 

OLry XX./ IN V-/. JJ7 


x uiypcpuuc ciicoucQ oy 0-C\£ iU IN w. jjo 


30 aa 


wo rn no* 360 


vj/vr x> segment z*f 


y0 nts 


Oily Ilv fNL/. JO I 


D/\1imoMlt/1o <mnn/ln/l Vk«* CCA TTA XT /A. 1 C A 

roiypepuae encoded oy o-bQ id nu: 36U 


30 aa 


CpA TT| xjfV 7A7 


uAr d segment *o 


90 nts 


oHVc *D NU. JO j 


D/\1lmonti/la ■-, „ „ J ^,,1 l__ , OCH TTA XT/A. 1 jfO 

roiypeptide encoded by biiQ ID NU: 362 


30 aa 


ocy ID NU. 304 


UAr B segment 26 


66 nts 


acQ ID NO: J 03 


Polypeptide encoded by SEQ ID NO: 364 


22 aa 


oJbQ ID NU: 3 oo 


NUr segment 1 


90 nts 


CT3A TT\ XT/A. inn 

otiQ ID NU: 36/ 


Polypeptide encoded by SEQ ID NO: 366 


30 aa 


CCA TT\ XT/A. 1 JCO 

bilQ ID NU: 368 


NEF segment 2 


90 nts 


CT7/A TTl XT/A. *2/?0 

ot\i ID NU. 36!/ 


Polypeptide encoded by SEQ ID NO: 368 


30 aa 


orSQ ID NU. 3 /U 


Nxir segment 3 


90 nts 


oCy iXJ NU. 3/1 


roiypeptide encoded by i>iiQ ID NU: 370 


30 aa 


SEQIDNO: 372 


NEF segment 4 


90 nts 

s \J Alio 




roiypepiiae encoaea oy oiiy ID NU. 3 fZ 


30 aa 


SFO ID NO- 374 


1NJ-JT dC^IIlClll J 


OA 

yo nts 


SEO TO NO- 375 


x oiypcpuue encoocu oy ocy ll/ inl/. 3 /4 


3U aa 


SEO ID NO- 376 


XJ Li' W cArrmPnt f\ 

INX-rX^ OC^IIlCIll \J 


yu nts 


SEQIDNO: 377 


Polypeptide encoded by SEQ ID NO: 376 j 


30 aa 

«/V CM* 


SEQn>NO:378 


NEF segment 7 


90 nts 


SEQIDNO: 379 


Polypeptide encoded by SEQ ED NO: 378 


30 aa 


SEQIDNO: 380 


NEF segment 8 


90 nts | 


SEQIDNO: 381 


Polypeptide encoded by SEQ ID NO: 380 


30 aa 


SEQ ID NO: 382 


NEF segment 9 


90 nts 
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BEi 


SEQ ID NO: 383 


Polypeptide encoded by SEQ ID NO: 382 


30 aa 


SEQIDNO: 384 


NEF segment 10 


90nts 


SEQ ID NO: 385 


Polypeptide encoded by SEQ ID NO: 384 


30 aa 


SEQIDNO: 386 


NEF segment 11 


90nts 


SEQIDNO: 387 


Polypeptide encoded by SEQ ID NO: 386 


30 aa 


SEQIDNO: 388 


NEF segment 12 


90nts 


SEQIDNO: 389 


Polypeptide encoded by SEQ ID NO: 388 


30 aa 


SEQIDNO: 390 


NEF segment 13 


78nts 


SEQIDNO: 391 


Polypeptide encoded by SEQ ID NO: 390 


26 aa 


SEQIDNO: 392 


HIV Cassette Al 


5703 nts 


SEQ ID NO: 393 


Polypeptide encoded by SEQ ID NO:392 


1896 aa 


SEQIDNO: 394 


HIV Cassette Bl 


5685 nts 


SEQIDNO: 395 


Polypeptide encoded by SEQ ID NO: 394 


1890 aa 


SEQ ID NO: 396 


HIV Cassette CI 


5925 nts 


SEQIDNO: 397 


Polypeptide encoded by SEQ ID NO: 396 


1967aa 


SEQIDNO: 398 


HIV Cassette A2 


5703 nts 


SEQIDNO: 399 


Polypeptide encoded by SEQ ID NO: 398 


1896 aa 


SEQ ID NO: 400 


HIV Cassette B2 


5685 nts 


SEQ ID NO: 401 


Polypeptide encoded by SEQ ID NO: 400 


1890 aa 


SEQ ID NO: 402 


HIV Cassette C2 


5925 nts 


SEQ ID NO: 403 


Polypeptide encoded by SEQ ED NO: 402 


1967 aa 


SEQ ID NO: 404 


HW complete Savine 


17244 nts 


SEQ ID NO: 405 


Polypeptide encoded by SEQ ID NO: 404 


5747 aa 


SEQ ID NO: 406 


HepCla consensus polyprotein sequence 


3011 aa 
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: ir-*; 


i 1 


' i 


SEQ ID NO: 407 


HepCla segment 1 


90 nts 


SEQ ID NO: 408 


Polypeptide encoded by SEQ ID NO: 407 


30 aa 


SEQ ID NO: 409 


HepCla segment 2 


90 nts 


SEQ ID NO: 410 


Polypeptide encoded by SEQ ID NO: 409 


30 aa 


SEQ ID NO: 411 


HepCla segment 3 


90 nts 


SEQ ID NO: 412 


Polypeptide encoded by SEQ ID NO: 41 1 


30 aa 


SEQ ID NO: 413 


HepCla segment 4 


90 nts 


SEQ ID NO: 414 


Polypeptide encoded by SEQ ID NO: 413 


30 aa 


SEQ ID NO: 415 


HepCla segment 5 


90 nts 


SEQ ID NO: 416 


Polypeptide encoded by SEQ ID NO: 415 


30 aa 


SEQ ID NO: 417 


HepCla segment 6 


90 nts 


SEQ ID NO: 418 


Polypeptide encoded by SEQ ID NO: 417 


30 aa 


SEQ ID NO: 419 


HepCla segment 7 


90 nts 


SEQ ID NO: 420 


Polypeptide encoded by SEQ ID NO: 419 


30 aa 


SEQ ID NO: 421 


HepCla segment 8 


90 nts 


SEQ ID NO: 422 


Polypeptide encoded by SEQ ID NO: 421 


30 aa 


SEQ ID NO: 423 


HepCla segment 9 


90 nts 


SEQ ID NO: 424 


Polypeptide encoded by SEQ ID NO: 423 


30 aa 


OT3/"fc TT"\ VTA. iOC 

SEQ ID NU: 425 


HepCla segment 10 


90 nts 


SEQ ID NO: 426 


Polypeptide encoded by SEQ ID NO: 425 


30 aa 


SEQ ID NO: 427 


HepCla segment 11 


90 nts 


SEQ ID NO: 428 


Polypeptide encoded by SEQ ID NO: 427 


30 aa 


SEQ ID NO: 429 


HepCla segment 12 


90 nts 


SEQ ID NO: 430 


Polypeptide encoded by SEQ ID NO: 429 


30 aa 
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i_ _ 






SEQIDNO:431 


HepCla segment 13 


90nts 


SEQIDNO:432 


Polypeptide encoded by SEQ ED NO: 431 


30 aa 


SEQIDNO:433 


HepCla segment 14 


90nts 


SEQ ED NO: 434 


Polypeptide encoded by SEQ ID NO: 433 


30 aa 


SEQIDNO:435 


HepCla segment 15 


90nts 


SEQ ID NO: 436 


Polypeptide encoded by SEQ ID NO: 435 


30 aa 


SEQ ID NO: 437 


HepCla segment 16 


90 nts 


SEQ ID NO: 438 


Polypeptide encoded by SEQ ID NO: 437 


30 aa 


SEQ ID NO: 439 


HepCla segment 17 


90 nts 


SEQ ID NO: 440 


Polypeptide encoded by SEQ ID NO: 439 


30 aa 


SEQ ID NO: 441 


HepCla segment 18 


90 nts 


SEQ ID NO: 442 j 


Polypeptide encoded by SEQ ID NO: 441 


30 aa 


SEQ ID NO: 443 


HepC 1 a segment 1 9 


90 nts 


SEQ ID NO: 444 


Polypeptide encoded by SEQ ID NO: 443 


30 aa 


SEQ ID NO: 445 


HepCla segment 20 


90 nts 


SEQ ID NO: 446 


Polypeptide encoded by SEQ ID NO: 445 


30 aa 


SEQ ID NO: 447 


HepCla segment 21 


90 nts 


SEQ ID NO: 448 


Polypeptide encoded by SEQ ID NO: 447 


30 aa 


SEQ ID NO: 449 


HepCla segment 22 


90 nts 


SEQ ID NO: 450 


Polypeptide encoded by SEQ ID NO: 449 


30 aa 


SEQ ID NO: 451 


HepCl a segment 23 


90 nts 


SEQ ID NO: 452 


Polypeptide encoded by SEQ ID NO: 451 


30 aa 


SEQ ID NO: 453 


HepCla segment 24 


90 nts 


SEQ ID NO: 454 


Polypeptide encoded by SEQ ID NO: 453 


30 aa 
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0"cf\ tt\ xiw vice 
ohKl ID NU: 433 


nepivia segment 15 


90 nts 


OT7/^V TT\ VTA. ic/ 

SEQ ID NO: 456 


1 a* » , J l f_ _ PPA TTV VTA A C C 

Polypeptide encoded by SEQ ID NO: 455 


30 aa 


OT?/"\ TT\ VTA Jfl 

SEQ ID NO: 457 


HepCla segment 26 


90 nts 


SEQ ID NO: 458 


Polypeptide encoded by SEQ ED NO: 457 


30 aa 


SEQ ID NO: 459 


HepCla segment 27 


90 nts 


SEQ ID NO: 460 


Polypeptide encoded by SEQ ID NO: 459 


30 aa 


SEQ ID NO: 461 


HepCla segment 28 


90 nts 


SEQ ID NO: 462 


Polypeptide encoded by SEQ ID NO: 461 


30 aa 


SEQ ID NO: 463 


HepCla segment 29 


90 nts 


SEQ ID NO: 464 


Polypeptide encoded by SEQ ID NO: 463 


30 aa 


SEQ ID NO: 465 


HepCla segment 30 


90 nts 


SEQ ID NO: 466 


Polypeptide encoded by SEQ ID NO: 465 


30 aa 


<■<•-■-» y-v TT\ VTA A S" '-I 

SEQ ID NO: 467 


HepCla segment 31 


90 nts : 


OPA TT\ VTA A)A?t% 

SEQ ID NO: 468 


Polypeptide encoded by SEQ ID NO: 467 


30 aa 


CT7 A TTV V TA. M£.t\ 

SEQ ID NO: 469 


HepCla segment 32 ! 


90 nts 


Or?n m v ta. AHf\ 

SEQ ID NO: 470 


TV _ «_ . . . a * « , ^ ^. J 1 I . OT?A TTV VT/\. A /T\ 

Polypeptide encoded by SEQ ID NO: 469 


30 aa 


SbQ ID NU: 4/1 


xiepL I a segment 33 


90 nts 


OTJ A TA VTA. y|T> 

bh,Kl ID NU: 4 /z 


TV^I--., n»T+. Ar> r-..n r. n rl r. rt V.. OT?/"\ TTV VTA. JT1 

rolypeptiae encoded by oJEQ ID NU: 471 


1 A _ 

30 aa 




ncpv^i a scgrxjciiL «>*f 


on *%f a 

y\j nis 


SEQ ID NO: 474 


Polypeptide encoded by SEQ ID NO: 473 


30 aa 


SEQ ID NO: 475 


HepCla segment 35 


90 nts 


SEQ ID NO: 476 


Polypeptide encoded by SEQ ID NO: 475 


30 aa 


SEQ ID NO: 477 


HepCla segment 36 


90 nts 


SEQ ID NO: 478 


Polypeptide encoded by SEQ ID NO: 477 


30 aa 
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SEO ID NO* 479 


TTf*nC1 21 ippmpnt 17 


90 nts 


SFO TD NO- 480 

OLy JUL/ i^l v/. "tO VI 


PnlvnpntiHp fnmH^H hv SFO TD NO* 47Q 


10 aa 


WnTHNn- 48 1 

OCy JUL/ iiU. *tO I 




00 ntc 


sfo in xrn- 489 


P/\1vn#*nrid*» pnrnHpH hv QFO TO 'MO* 481 


in 33 

jU oa 


otty UJ IN LI. -+5 J 


JtiepL^Ja segment yy 


yii nis 


oriy UJ INLJ. 4o4 


jroiypepuae encoaea oy oJcVt* JJJ inli. *♦<> j 


ju aa 


UJ NLI. 4oO 


xiepis i a segment 4U 


yu nts 


JJJ IN LI. 4oO 


roiypepuae encoaea oy oc\i JJJ inu. -*oO 


3U aa 


oby jnlj. 4o / 


riepLxia segment hi 


OH nfc 

yu nts 


pen FT* XT/V AQQ 
on\l JJJ INLJ. 4oo 


Jrojypepuoe encoaea oy ojcy juj inli. ho i 


ju aa 


onA^l JJJ IN LI. *toy 


xiepL/ j a segment *+z 


On Tk+r> 

y u nts 


CPA TT* xjfY AQH 
oCA^ JJJ IN LI. *fy\J 


jroiypepuae encoaea oy ocy uj invj. *K>y 


*^n oo 
ju aa 


sfo Tn wn- 401 


riepi^ i a segment hj 


On rite 

y u nts 


spo rn mo- 49? 


PnlvnpritiHp pnrwTf»H hv SFO ID MO* 401 


in on 


SFO TD NO* 491 

Oi-'y JUL/ i^lV/. "7J 


Wpt*if*l a cpompnt 44 

XXw^JV/ 1 a OCELLI UIll *T*T 


00 nt»? 


SEO ID NO- 494 


Polvnpntide encoded hv SFO TD NO- 491 


10 aa 


SEO TD NO- 495 

ui->y XX/ 1 — W . ^ y ~J 


HenfM a cpm-nPTit 45 


00 nt<5 


SEO ID NO* 496 


Polvnentide encoded bv SEO ID NO* 495 


in aa 

JU OA 


SEQ ID NO: 497 


HepCl a segment 46 


90 nts 


SEQJDNO:498 


Polypeptide encoded by SEQ ID NO: 497 


30 aa 


SEQ ID NO: 499 


HepC la segment 47 


90 nts 


SEQ ID NO: 500 


Polypeptide encoded by SEQ ID NO: 499 


30 aa 


SEQ ID NO: 501 


HepCla segment 48 


90 nts 


SEQ ID NO: 502 


Polypeptide encoded by SEQ ID NO: 501 


30 aa 
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ar\l ID IMU. 5UJ 


TJ«r't «* Ji_u..n_i_i-i 1 A C\ 

HepCla segment 49 


90nts 


CCA TT% XTr\. Cf\A 

obQ ID NU: 504 


Polypeptide encoded by SEQ ID NO: 503 


30 aa 


SEQ ID NO: 505 


HepCla segment 50 


90 nts 


SEQ ID NO: 506 


Polypeptide encoded by SEQ ID NO: 505 


30 aa 


SEQ ID NO: 507 


HepCla segment 51 


90 nts 


SEQ ID NO: 508 


Polypeptide encoded by SEQ ID NO: 507 


30 aa 


SEQ ID NO: 509 


HepCla segment 52 


90 nts 


SEQ ID NO: 510 


Polypeptide encoded by SEQ ID NO: 509 


30 aa 


SEQ ID NO: 511 


HepCla segment 53 


90 nts 


SEQ ID NO: 512 


Polypeptide encoded by SEQ ID NO: 5 1 1 


30 aa 


SEQ ID NO: 513 


HepCla segment 54 


90 nts 


SEQ ID NO: 514 


Polypeptide encoded by SEQ ID NO: 513 


30 aa 


SEQ ID NO: 515 


HepCla segment 55 


90 nts 


SEQ ID NO: 516 


Polypeptide encoded by SEQ ID NO: 515 


30 aa 


CIJA Tfx XT A. C 1 *7 

MiQ ID NO: 517 


HepCla segment 56 


90 nts 


orSl^ ID NU. 51 o 


Polypeptiae encoded by SEQ ID NO: 517 


30 aa 


OEA TT\ XT A. CIO 

obQ ID NU: 51SJ 


HepCla segment 57 


90 nts 


OTTO TT> XT A- CIA 

ID JNU. 52U 


Polypeptide encoded by SEQ ID NO: 519 


30 aa 


SEO ID NO- 521 




y\) nts 


SEQ ID NO: 522 


Polypeptide encoded by SEQ ID NO: 521 


30 aa 


SEQ ID NO: 523 


HepC 1 a segment 59 


90 nts 


SEQ ID NO: 524 


Polypeptide encoded by SEQ ID NO: 523 


30 aa 


SEQ ID NO: 525 


HepCla segment 60 


90 nts 


SEQ ID NO: 526 


Polypeptide encoded by SEQ ID NO: 525 


30 aa | 
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SEQIDNO: 527 


xiepoia segment 01 


90 nts 


SEQIDNO: 528 


roiypepuae encoded by obQ ID NO: 527 


30 aa 


SEQIDNO: 529 


HepC la segment 62 


90 nts 


SEQIDNO: 530 


Polypeptide encoded by SEQ ID NO: 529 


30 aa 


SEQIDNO: 531 


HepCla segment 63 


90 nts 


SEQIDNO: 532 


Polypeptide encoded by SEQ ID NO: 53 1 


30 aa 


SEQIDNO: 533 


HepCla segment 64 


90 nts 


SEQIDNO: 534 


Polypeptide encoded by SEQ ID NO: 533 


30 aa 


SEQIDNO: 535 


HepCla segment 65 


90 nts 


SEQIDNO: 536 


Polypeptide encoded by SEQ ID NO: 535 


30 aa 


SEQIDNO: 537 


HepCla segment 66 


90 nts 


SEQIDNO: 538 


Polypeptide encoded by SEQ ID NO: 537 


30 aa 


SEQIDNO: 539 


HepCla segment 67 


90 nts 


SEQIDNO: 540 


Polypeptide encoded by SEQ ID NO: 539 


30 aa 


SEQIDNO: 541 


HepCla segment 68 


90 nts 


SEQIDNO: 542 


DAllMAMtillA AMAA/liJ L, , CT3/^ TT\ "VTi^. fill 

roiypepnae encoded by 5>EQ ID NO: 541 


30 aa 


SEQIDNO: 543 


riepu i a segmeni oy 


90 nts 


SEQIDNO: 544 


roiypcpuae encoaeo oy b&Q uli NU! j4j 


30 aa 


SEQIDNO: 545 


HeoCla segment 70 


Qfl ntc 
y\J Did 


SEQIDNO: 546 


Polypeptide encoded by SEQ ID NO:545 


30 aa 


SEQIDNO: 547 


HepCla segment 71 


90 nts 


SEQIDNO: 548 


Polypeptide encoded by SEQ ID NO: 547 


30 aa 


SEQIDNO: 549 


HepCla segment 72 


90 nts 


SEQ ID NO: 550 


Polypeptide encoded by SEQ ID NO: 549 


30 aa 
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oxiy LU WU. DM 


riepuia segment 7 J 


90nts 


bEQ ID NU: 552 


Polypeptide encoded by SEQ ID NO: 551 


30 aa 


SEQ ID NO: 553 


HepCla segment 74 


90 nts 


rtpn f fx \TA CCA 

SEQ ID NO: 554 


Polypeptide encoded by SEQ ID NO: 553 


30 aa 


SEQ ID NO: 555 


HepCla segment 75 


90 nts 


SEQ ID NO: 556 


Polypeptide encoded by SEQ ID NO: 555 


30 aa 


SEQ ID NO: 557 


HepCla segment 76 


90 nts 


SEQ ID NO: 558 


Polypeptide encoded by SEQ ID NO: 557 


30 aa | 


SEQ ID NO: 559 


HepCla segment 77 


90 nts 


SEQ ID NO: 560 


Polypeptide encoded by SEQ ID NO: 559 


30 aa 


SEQ ID NO: 561 


HepCla segment 78 


90 nts 


SEQ ED NO: 562 


Polypeptide encoded by SEQ ID NO: 561 


30 aa 


SEQ ID NO: 563 


HepCla segment 79 


90 nts 


SEO ID NO- 564 




30 aa 


CCA TT* XTf\. C/CC 

a&Q ID NU: 565 


HepCla segment 80 


90 nts 


OT3A TT\ VTA. CCC 

5>EQ ID NO: 566 


Polypeptide encoded by SEQ ED NO: 565 


30 aa 


orA tt\ XT A. CJC7 

J>lsQ ID NU: 567 


HepCla segment 81 


90 nts 


ID IMU. Duo 


Polypeptide encoded by bEQ ID NU: 567 


30 aa 




ncpi^ia segment o« 


90 nts 


SEQ ID NO: 570 


Polypeptide encoded by SEQ ID NO: 569 


30 aa 


SEQ ID NO: 571 


HepCla segment 83 


90 nts 


SEQ ID NO: 572 


Polypeptide encoded by SEQ ID NO: 571 


30 aa 


SEQ ID NO: 573 


HepCla segment 84 


90 nts 


SEQ ID NO: 574 


Polypeptide encoded by SEQ ID NO: 573 


30 aa 
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CCA TTfc XT A. C7C 

bc\l ID NU: 575 


HepCla segment 85 


90 nts 


SEQ ID NO: 576 


Polypeptide encoded by SEQ ID NO: 575 


30 aa 


Cin^N TT"V VTA 

SEQ ID NO: 577 


HepCla segment 86 


90 nts 


SEQ ID NO: 578 


Polypeptide encoded by SEQ ID NO: 577 


30 aa 


SEQ ID NO: 579 


HepCla segment 87 


90 nts 


SEQ ID NO: 580 


Polypeptide encoded by SEQ ID NO: 579 


30 aa 


SEQ ID NO: 581 


HepCla segment 88 


90 nts 


SEQ ID NO: 582 


Polypeptide encoded by SEQ ID NO: 581 


30 aa 


SEQ ID NO: 583 


HepCla segment 89 


90 nts 


SEQ ID NO: 584 


Polypeptide encoded by SEQ ID NO: 583 


30 aa 


SEQ ED NO: 585 


HepCla segment 90 


90 nts 


SEQ ID NO: 586 


Polypeptide encoded by SEQ ID NO: 585 


30 aa 


SEQ ID NO: 587 


HepCla segment 91 


90 nts 


onA n> \ta f on 

SEQ ID NO: 588 


Polypeptide encoded by SEQ ID NO: 587 


30 aa 


SEQ ID NO: 589 


HepCla segment 92 


90 nts 


CCA TT\ VTA. f AA 

•SxiQ ID NO: 590 


Polypeptide encoded by SEQ ID NO: 589 


30 aa 


CCA m XTA. C01 

a£Q ID NU: jyi 


HepCla segment 93 


90 nts 


CT7A TTfc XTA. CAO 

olSQ ID NU: JVZ 


rolypepuae encoded by bEQ ID NO: 591 


30 aa 


oEy JUL* DZfJ 




90 nts ! 


SEQ ID NO: 594 


Polypeptide encoded by SEQ ID NO: 593 


30 aa 


SEQ ID NO: 595 


HepCla segment 95 


90 nts 


SEQ ID NO: 596 


Polypeptide encoded by SEQ ID NO: 595 


30 aa 


SEQ ID NO: 597 


HepCla segment 96 


90 nts 


SEQ ID NO: 598 


Polypeptide encoded by SEQ ID NO: 597 


30 aa 
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SEO ID NO- 599 




on nto 
y\j his 


CRT) TT\ XTA. fiAA 


r oiypepuae encoaea Dy icy id nu. jyy 


30 aa 


OjCA^ ID NU. OUI 


riep^ia segment y© 


AA a.— 

90 nts J 




roiypepnae encoded oy o.cv£ 1D.NU: oUl 


30 aa 


PTJA TT\ XTA. 4CA1 

olivj ID NU) 003 


JriepCl a segment 99 


90 nts 


OT?A TT* XTA- £AjI 

obQ ID NU: 604 


Polypeptide encoded by SEQ ID NO: 603 


30 aa 


ccn rr\ xta. jcac 

oEQ ID NU: o0> 


HepCI a segment 1UU 


90 nts 


cca rr\ xta. tLi\c 
oby ID NU: 0O0 


Polypeptide encoded by SEQ ID NO: 605 


30 aa 


cca tt\ xta- AiYi 
oe,{{ ID nu: ou / 


HepC la segment 101 


90 nts 


PDA TT\ XT A- iCAO 

oJbQ ID NU. 0O0 


rolypeptide encoded by &EQ ID NO: 607 


30 aa 


oiiy ID NU. Q\Jy 


jiepci a segment luz 


90 nts 


con ttv xja» ai a 

o£v£ ID NU. 01U 


DAtimAMlifla AfiMMfla^ kii OX? A TT\ XTA. iCAA 

Polypeptide encoded by oEQ ID NO: 609 


30 aa 


cba rn xio- <i i 

olil^ ID NU. 01 1 


riepd a segment IU3 


90 nts 


oilV^ 1LJ INw. OIZ 


DAltmAnli/1a Kw CD A TA XTA. /Til 

roiypepnae encoaea oy oJdv£ id nu: ol l 


*5A K > 

30 aa 


cpn rn xta. 


nepv,i a segment iw 


OA 

90 nts 




ruiypepuue encoaea oy ony lli invj. ou 


30 aa 


worn no- 




OA w\*e% 

yy) nts 






1A n«« 

30 aa 


SEQIDNO:617 


HepCla segment 106 


90 nts 


SEQIDNO: 618 


Polypeptide encoded by SEQ ID NO: 617 


30 aa 


SEQIDNO: 619 


HepCla segment 107 


90 nts 


SEQIDNO:620 


Polypeptide encoded by SEQ ID NO: 619 


30 aa 


SEQIDNO: 621 


HepCla segment 108 


90 nts 


SEQIDNO: 622 


Polypeptide encoded by SEQ ID NO: 621 


30 aa 
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riepc- i a se gin en i x vy 


OH rite 

yv nis 


ct?n rr\ xta. ai>i 


Dnltinan4t/1a m*«mm.s1ms1 . CCA TA XTA* 

Polypeptide encoded oy obQ id nu. ozj 


jv aa 


oca rrv xt A. jcoc 
dEQ ID NU: OZO 


HepLla segment 1 1U 


OA «4m 

yi) nts 


SEQ ID NU: 626 


Folypqpude encoded by obQ ID NU: oz5 


1f\ MM 

JU aa 


SEQ ID NO: 627 


TT„ri i _ a iff 

HepCla segment 111 


90 nts 


SEQ ID NO: 628 


Polypeptide encoded by SEQ ID NO: 627 


30 aa 


SEQ ID NO: 629 


HepCla segment 1 12 


90 nts 


SEQ ID NO: 630 


Polypeptide encoded by SEQ ID NO: 629 


30 aa 


SEQ ID NO: 631 


HepCla segment 1 13 


90 nts 


SEQ ID NO: 632 


Polypeptide encoded by SEQ ID NO: 631 


30 aa 


SEQ ID NO: 633 


HepCla segment 1 14 


90 nts 


SEQ ID NO: 634 


Polypeptide encoded by SEQ ID NO: 633 


30 aa 


OT?/"\ TT\ VTA. /if 

SEQ ID NO: 635 


HepCla segment 115 


yv nts 


OCA TT\ XTA. HI £i 

SEQ ID NO: 636 


D/>1„*«*,'J« a-^^J^ CCA TH XT A- til C 

Polypeptide encoded by obQ ID NU: 635 


OA MM 

3U aa 


OCA TA XT A- AIT 

obQ ID NU. 03 / 


riepda segment 1 10 


OA *i*m 

yv nts 


CCA TA XT A- <1Q 

bbQ ID NU. oio 


D/\1imiMtf«#1a *>r\r*r\A d-A V. w CCA TA XTA* &11 

roiypepude encoded oy obi^ id inu. oj / 


1A 44 

3U aa 


CT7A TT\ XTYV 
obvj *D NU. Ojy 


UamPIa oAftmAnr 11*7 

riepcia segment 11/ 


OA nto 

nts 


CCA TT% XI A* AAA 
obv£ ID ISU. 04U 


T>/\K»%M\firlo onr-nAoA Y\ir CCA TT\ XTA. jZ'lQ 

r oiypepuue encoueu oy oxiv^ nj rs\j. ojy 


Tft MM 

aa 






00 rite 


SEQ ro NO: 642 


Polypeptide encoded by SEQ ID NO: 641 


30 aa 


SEQ ID NO: 643 


HepCla segment 1 19 


90 nts 


SEQ ID NO: 644 


Polypeptide encoded by SEQ ID NO: 643 


30 aa 


SEQ ID NO: 645 


HepCla segment 120 


90 nts 


SEQ ID NO: 646 


Polypeptide encoded by SEQ ID NO: 645 


30 aa 
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SEQ ID NO: 647 


HepC la segment 121 


90 nts 


SEO ID NO- 648 


Polvnentide encoded bv SFO ID NO- 647 


10 aa 


SEO ID NO- 649 

OlA^ JUL-/ iivy. V/*T.7 




OA rite 


SEO ID NO- 650 

L/J-A^ JUL/ \J*J\J 


Pnlvnpnfidp en ended hv <2FO TO NO* 6&Q 


Ifi aa 
JU aa 


SFO TDNO* fttl 


T-T^-nfM a cpam^nt 191 


QO ntc 

?u nis 


SFO m NO* MO 


Pn 1 vn p-r\\\ c\ r> t*nrnApA Kv QPH ID NO- 

i oiypcpuuc cncoueu oy oi-\< 11/ inva oji 


ju aa 


WO TO NO- tf<;i 


llpnPI a cpompnf 1 OA 

ncpv^ i a segmeni iz*t 


yu nts 


oHV^ JUL* vi\J. 00*t 


roiypepuae encoaea oy oe,ki iaj lskj. oj j 


~n sm 
jV aa 


qfo th no- 


xiepv^ l a segment i zj 


OO n4n 

yu nts 




roiypepnae encoaea oy jjlf inva odd 


in aa 
jU aa 


SFO TO NO- fi*V7 


XJp*>Ol *» cpomMif 19^ 
IlCp^Ia dCgUlull 1 Z-U 


OA nfp 

yu nis 


SFO in NO- ftSR 


x oiypcpuue cucoucu oy ony llf inv/. o3 / 


ju aa 


SFO ID NO- 6SQ 




OA ntr 

yu nxs 


SEO ID NO- 660 


Polvnentide encoded hv SFO TO NO- fiSQ 


->u cut 


SEO ID NO- 661 


HenC 1 & se orient 1 28 


QO ntQ 

^V/ 11 lO 


SEO ID NO- 662 


Polvnentide encoded bv SEO ID NO- 661 


10 aa 


SEO ID NO: 663 


HenC la seement 129 


90 nts 

X V/ 11 lo 


SEO ID NO: 664 


Polvnentide encoded bv SEO ID NO- 663 


10 aa 


SEQ ID NO: 665 


HepCla segment 130 


90 nts 


SEQ ID NO: 666 


Polypeptide encoded by SEQ ID NO: 665 


30 aa 


SEQ ID NO: 667 


HepCla segment 131 


90 nts 


SEQ ID NO: 668 


Polypeptide encoded by SEQ ID NO: 667 


30 aa 


SEQ ID NO: 669 


HepCla segment 132 


90 nts 


SEQ ID NO: 670 


Polypeptide encoded by SEQ ID NO: 669 


30 aa 
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nep^ i a segment i ^ 3 


>U nts 


olivj ID NU. 0 /z 


roiypepude encoaea by obQ ID NU: 671 


30 aa 


olsQ ID NU: 6 /J 


TT n ,-. \ _ n n n i i n — #■ 1 *i >• 

HepC I a segment 134 


90 nts 


SEQ ID NU: 674 


Polypeptide encoded by SEQ ID NO: 673 


30 aa 


SEQ ID NO: 675 


HepC la segment 135 


90 nts 


OT*^% YT"Y \TA ^*Ty 

SEQ ID NO: 676 


Polypeptide encoded by SEQ ID NO: 675 


30 aa 


SEQ ID NO: 677 


HepCla segment 136 


90 nts 


SEQ ID NO: 678 


Polypeptide encoded by SEQ ID NO: 677 


30 aa 


SEQ ID NO: 679 


HepCla segment 137 


90 nts 


SEQ ID NO: 680 


Polypeptide encoded by SEQ ID NO: 679 


30 aa 


SEQ ID NO: 681 


HepCla segment 138 


90 nts 


SEQ ID NO: 682 


Polypeptide encoded by SEQ ID NO: 681 


30 aa 


SEQ JD NO: 683 


HepCla segment 139 


90 nts 


SEQ ID NU: 084 


TX-i- - A*«_ j _ j pT?A TT> \TA. ^ r>^ 

Polypeptide encoded by SEQ ID NO: 683 


30 aa 


oEQ ID NU: ooj 


HepC l a segment 14U 


90 nts 


oJtiv^ ID INU. OoO 


Polypeptide encoaea by obQ ID NU: 685 


30 aa 


ocv^ AD jnd: oo / 


UamPI A Pa/miiMif 1/4 1 

Hepuia segment 141 


90 nts 


OEt\£ ID INU. 055 


roiypepnae encoaea oy oiivj id nu: oo / 


30 aa 


SEO ID NO- 689 


ITenPl A ^ppmpTit 1 49 




SEQ ID NO: 690 


Polypeptide encoded by SEQ ID NO: 689 


30 aa 


SEQ ID NO: 691 


HepC la segment 143 


90 nts 


SEQ ID NO: 692 


Polypeptide encoded by SEQ ID NO: 691 


30 aa 


SEQ ID NO: 693 


HepCla segment 144 


90 nts 


SEQ ID NO: 694 


Polypeptide encoded by SEQ ID NO: 693 


30 aa ! 
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SEQ ID NO: 695 


HepCla segment 145 


90nts 


SEQ ID NO: 696 


Polypeptide encoded by SEQ ID NO: 695 


30 aa 


SEO ID NO: 697 


HenC la segment 146 


90 nt« 


SEO ID NO: 698 


Polvoeotide encoded bv SEO ID NO- 697 


10 aa 

JV CLCL 


SEO ED NO- 699 


HenDla segment 147 


Oft ntc 


SEO ID NO- 700 


Pnlvnpnfide enmried hv SFO TD NO- rtQO 


1ft an 
->v aa 


SFO TT> NO- 701 

OA-»V^ mXJ V1\J* f\Jl 




Qft ntc 


<2FO TTj NO- 709 


Pnlvnpnri/1** <^rrkHM Kv QFO TTI NO* 7ft 1 


jU aa 


^fo rn no- 7fti 




Oft ntc 


^FO ID NO- 704 


PnlvnpntiHp f*nr/v1f»H hv ^FO TO NO* 70*3 


aa 


SFO ID NO- 705 




Qft ntc 


SEO ID NO- 706 


Polvnentide ennnHerl hv 9FO TO NO- 70S 


1ft 

Jv da 


SEO ID NO- 707 


TTeoCl a oepment 151 


Oft ntc 


SEO ID NO- 708 


PoIvDentide encoded bv SEO ID NO- 707 


1ft aa 
Ju cut 


SEQ ID NO: 709 


HepC 1 a segment 1 52 


90 nts 


SEQ ID NO: 710 


Polypeptide encoded bv SEO ID NO* 709 


30 aa 


SEQ ID NO: 711 


HepCla segment 153 


90 nts 


SEQ ID NO: 712 


Polypeptide encoded bv SEO ID NO* 711 


30 aa 


SEQ ID NO: 713 


HepCla segment 154 


90 nts 


SEQ ID NO: 714 


Polypeptide encoded by SEQ ID NO: 713 


30 aa 


SEQ ID NO: 715 


HepCla segment 155 


90 nts 


SEQ ID NO: 716 


Polypeptide encoded by SEQ ID NO: 715 


30 aa I 


SEQ ID NO: 717 


HepCla segment 156 


90 nts 


SEQ ID NO: 718 


Polypeptide encoded by SEQ ID NO: 717 


30 aa 
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SEQIDNO:719 


HepCla segment 157 


90 nts 


SEQIDNO: 720 


Polypeptide encoded by SEQ ID NO: 719 


30 aa 


SEQIDNO: 721 


HepCla segment 158 


90 nts 


SEQ ID NO: 722 


Polypeptide encoded by SEQ ID NO: 721 


30 aa 


SEQIDNO: 723 


HepCla segment 159 


90 nts 


SEQIDNO: 724 


Polypeptide encoded by SEQ ID NO: 723 


30 aa 


SEQIDNO: 725 


HepCla segment 160 


90 nts 


SEQIDNO: 726 


Polypeptide encoded by SEQ ID NO: 725 


30 aa 


SEQIDNO: 727 


HepCla segment 161 


90 nts 


SEQ ID NO: 728 


Polypeptide encoded by SEQ ID NO: 727 


30 aa 


SEQIDNO: 729 


HepCla segment 162 


90 nts 


SEQ ID NO: 730 


Polypeptide encoded by SEQ ID NO: 729 


30 aa 


SEQIDNO: 731 


HepCla segment 163 


90 nts 


SEQIDNO: 732 


Polypeptide encoded by SEQ ID NO: 731 


30 aa 


SEQIDNO: 733 


HepCla segment 164 


90 nts 


SEQIDNO: 734 


Polypeptide encoded by SEQ ID NO: 733 


30 aa 


SEQIDNO: 735 


HepCl a segment 165 


90 nts 


SEQIDNO: 736 


Polypeptide encoded by SEQ ID NO: 735 


30 aa 


SEQIDNO: 737 


HepCla segment 166 


90 nts 


SEQ ID NO: 738 


Polypeptide encoded by SEQ ID NO: 737 


30 aa 


SEQIDNO: 739 


HepCla segment 167 


90 nts 


SEQ ID NO: 740 


Polypeptide encoded by SEQ ID NO: 739 


30 aa 


SEQ ID NO: 741 


HepCla segment 168 


90 nts 


SEQ ID NO: 742 


Polypeptide encoded by SEQ ID NO: 741 


30 aa 
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PEA T"F\ XTO- HA 1 


riepc l a segment l oy 


yu nts 


SEQ ID NO: 744 


Polypeptide encoded by SEQ ID NO: 743 


30 aa 


SEQ ID NO: 745 


HepCl a segment 170 


90 nts 


SEQ ID NO: 746 


Polypeptide encoded by SEQ ID NO: 745 


30 aa 


SEQ ID NO: 747 


HepCla segment 171 


90 nts 


SEQ ID NO: 748 


Polypeptide encoded by SEQ ID NO: 747 


30 aa 


SEQ ID NO: 749 


HepCla segment 172 


90 nts 


SEQ ID NO: 750 


Polypeptide encoded by SEQ ID NO: 749 


30 aa 


SEQ ID NO: 751 


HepCla segment 173 


90 nts 


SEQ ID NO: 752 


Polypeptide encoded by SEQ ID NO: 751 


30 aa 


SEQ ID NO: 753 


TT 1 a -trim 

HepCla segment 174 


90 nts 


SEQ ID NO: 754 


Polypeptide encoded by SEQ ID NO: 753 


30 aa 


SEQ ID NO: 755 


HepCla segment 175 


90 nts 


SEQ ID NO: 756 


Polypeptide encoded by SEQ ID NO: 755 


30 aa 


SEQ ID NO: 757 


HepCla segment 176 


90 nts 


SEQ ID NO: 758 


Polypeptide encoded by 5>EQ ID NO: 757 


30 aa 


SEQ ID NO: 759 


HepC 1 a segment 1 77 


90 nts 


flTJA TT\ XT/"\. iat\ 

bEQ ID NU: 7o0 


Polypeptide encoded by oHQ ID NU: 759 


30 aa 


OjC\1 ixJ Pi\J. /Ol 


nepi^ i a segment i /a r> ^^ 


yu ms 


SEQIDNO: 762 


Polypeptide encoded by SEQ ID NO: 761 


30 aa 


SEQIDNO: 763 


HepCla segment 179 


90 nts 


SEQ ID NO: 764 


Polypeptide encoded by SEQ ID NO: 763 


30 aa 


SEQIDNO: 765 


HepCla segment 180 


90 nts 


SEQIDNO: 766 


Polypeptide encoded by SEQ ID NO: 765 


30 aa 
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id ri\j, /o / 


nepv^ i a segment 1 o x 


yu nts 


otja tti xt/v 7£9 
OCV^ ULI iNU. /Do 


roiypepnae encoaea oy ot\i id inu. /o/ 


jU aa 


bbQ ID NU. fW 


riepCla segment loz 


90 nts 


bfcQ ID NU: / /U 


Polypeptide encoaea oy 5>EQ ID NO: /o9 


in ma 

30 aa 


SEQ ID NO: 771 


HepC la segment 183 


90 nts 


SEQ ID NO: 772 


Tl _ 1 a? j _ „ * * I htja TTX \TA. *TT1 

Polypeptide encoded by SEQ ID NO: 771 


30 aa 


SEQ ID NO: 773 


HepCla segment 184 


90 nts 


SEQ ID NO: 774 


T» 1 J J _J 1_ fiT»A TT\ \TA » 1* IH 

Polypeptide encoded by SEQ ID NO: 773 


30 aa 


rr\ VTA 

SEQ ID NO: 775 


HepCla segment 185 


90 nts 


SEQ ID NO: 776 


Polypeptide encoded by SEQ ID NO: 775 


30 aa 


SEQ ID NO: 777 


HepCla segment 186 


90 nts 


SEQ ID NO: 778 


Polypeptide encoded by SEQ ID NO: 777 


30 aa 


bEQ ID NO: 779 


TT__ Al _ * | 0*7 

HepC 1 a segment 1 87 


90 nts 


otja T-rx XWV TO A 


rolypeptide encoaea by JSEQ ID NU: / /y 


30 aa 


OT?A it* XJO- 'TCI 


xiepL/ia segment loo 


yu nts 


oJDVl 1LJ INU. /oZ 


roiypepnae encoaea oy oevi *d inu. /oi 


d\) aa 


OCV^ U-' INU. /oj 


oep^ i a segment 1 


yu nts 


or?rv in XJfV 79A 


roiypepnae encoaea Dy oiiy id inu. /oo 


ju aa 


°*FO ID NO- 785 


HenT 1 1 a ferment 100 




SEQ ID NO: 786 


Polypeptide encoded by SEQ ID NO: 785 | 


30 aa 


SEQ ID NO: 787 


HepCla segment 191 


90 nts 


SEQ ID NO: 788 


Polypeptide encoded by SEQ ID NO: 787 


30 aa 


SEQ ID NO: 789 


HepCla segment 192 


90 nts 


SEQ ID NO: 790 


Polypeptide encoded by SEQ ID NO: 789 


30 aa 
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TJp*>f 1 a cpompnt 10^ 

xicpv^ i a segment i "j 


y\j nis 


cba rn xjfv 700 
oJbv^ id xnu. /yz 


roiypepuae encoaea oy ocki id jnu. /yi 


'JA 00 

3U aa 


ocxi id xsu. /yj 


riepuia segment iy4 


90 nts 


oilQ ID InU. /y4 


roiypepuae encoaea oy ocy id inu. /yj 


ju aa 


btQ id no: /9j 


HepC 1 a segment 1 9d 


90 nts 


PDA TA VT/^. "JClC 

SEQ ID NO: 796 


roiypepuae encoded by SEQ ID NO: 795 


30 aa 


SEQ ID NO: 797 


HepCla segment 196 


AA a. 

90 nts 


OPA TTX VTA. TAO 

SEQ ID NO: 798 


Polypeptide encoded by SEQ ID NO: 797 


*>/V ^ 

30 aa 


onrv no via. *7nn 

SEQ ID NO: 799 


HepCla segment 197 


90 nts | 


npA TT\ VTA. OAA 

SEQ ID NO: 800 


t\ 1 , a* » - ] i 1 nrjA TA VTA. 1AA 

Polypeptide encoded by SEQ ID NO: 799 


30 aa 


OT?A TT"\ VTA. OA1 

SEQ ID NO: 801 


HepCla segment 198 


AA a 

99 nts 


SEQ ID NO: 802 


T> . . 1 . — - J n . OT3A TA VTA. OA 1 

Polypeptide encoded by SEQ ID NO: 801 


OA 

30 aa 


PEA TT\ XT/'V OA1 

J>bQ ID NU: oUi 


HepCla segment 199 


AA M 4. 

90 nts 


CT?/^ TT\ XT/'V OA/I 

bEQ ID NO: oU4 


Polypeptide encoded by oEQ ID NO: 8U3 


30 aa 


CT7A TA XJO- ftfK 


riepv^ i a segment zuu 


yu nts 


DXiv^ LD XN\J. 51a) 


roiypepuae encoaea oy oeki id xnu. oio 


OA 

3U aa 


opn TTI MO- ftfi7 

OJl>V< -U-/ INUi OU/ 


oepv^ia segment zui 


*rv nis 


cca ta xjO- QAC 


roiypepuae encoaea oy i±j iskj. ou / 


i j aa 


SEOIDNO- 809 


HenC 1 a ccramhleH 


17955 nts 


SEQ ID NO: 810 


Polypeptide encoded by SEQ ID NO: 809 


5985 aa 


SEQ ID NO: 811 


HepC Cassette A 


6065 nts 


SEQ ID NO: 812 


Polypeptide encoded by SEQ ID NO: 8 1 1 


2011 aa 


SEQ ID NO: 813 


HepC Cassette B 


6069 nts 


SEQ ID NO: 814 


Polypeptide encoded by SEQ ID NO: 813 


2010 aa 
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■HhHI 






SEQIDNO:815 


HepC Cassette C 


6030 nts 


SEQIDNO: 816 


Polypeptide encoded by SEQ ID NO: 8 1 5 


1997 aa 


SEQ ED NO: 817 


gplOO consensus polypeptide 


661 aa 


SEQIDNO: 818 


MART consensus polypeptide 


118aa 


SEQ ID NO: 819 


TRP-l consensus polypeptide 


248 aa 


SEQ ID NO: 820 


Tyros consensus polypeptide 


529 aa 


SEQIDNO: 821 


TRP2 consensus polypeptide 


519 aa 


SEQ ID NO: 822 


MC1R consensus polypeptide 


317 aa 


SEQ ID NO: 823 


MUC1F consensus polypeptide 


125 aa 


SEQ ID NO: 824 


MUC1R consensus polypeptide 


312 aa 


SEQ ID NO: 825 


BAGE consensus polypeptide 


43 aa 


SEQ ID NO: 826 


GAGE-1 consensus polypeptide 


138 aa 


SEQ ID NO: 827 


gpl001n4 consensus polypeptide 


51 aa 


SEQ ID NO: 828 


MAGE- 1 consensus polypeptide 


309 aa 


SEQ ID NO: 829 


MAGE- 3 consensus polypeptide 


314 aa 


SEQIDNO: 830 


PRAME consensus polypeptide 


509 aa 


SEQIDNO: 831 


TRP21N2 consensus polypeptide 


54 aa 


SEQ ID NO: 832 


NYNSOla consensus polypeptide 


180 aa 


SEQ ID NO: 833 


NYNSOlb consensus polypeptide 


58 aa 


SEQ ID NO: 834 


LAGE1 consensus polypeptide 


180 aa | 


SEQ ID NO: 835 


gplOO segment 1 


90 nts 


SEQ ID NO: 836 


Polypeptide encoded by SEQ ID NO: 835 


30 aa 


SEQIDNO: 837 


gplOO segment 2 


90 nts 


SEQ ID NO: 838 


Polypeptide encoded by SEQ ID NO: 837 


30 aa 
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SEO ID NO- 839 


on 100 segment 3 


00 ntc 
7U ilia 


SEO ID NO- 840 


Polvnentide pnrnHftH hv <2FO TD NO* ft 10 


10 OO 


<5FO ID NO- 841 


JgJJ 1 UU OCELLI CIll *T 


On nto 

vu nts 


SIFO TD NO- 842 

OL>V£ xL/ 1^1 V/. 0*tx. 


Pnlvnprttirlp f»nrAnVH Kvr QFO IT) NO- RAl 


J u aa 


♦3FO ID NO- R41 


on 1 OO oprrm #*r»t 
gpiw 2>C*^lIlwUl «J 


on 

yu nts 


qpo m no* ftAd. 


i oiypepnae encoaea oy ixj iNLr. 54 J 


JU aa 


ocn TTI MH- RA*a 


gp i uu segment o 


OA h4h 

yu nts 


OXiv^ 11/ 1NU. o40 


r oiypepnae encoaea by ojcvJ IU 1NU: 845 


30 aa 




gp i uu segment / 


90 nts 


QT70 TT\ MA* ftAQ 
OXiV< AlV 045 


Jroiypepnae encoaea by blivi JLL) INU: o47 


OA 

30 aa 


oCrl^ JLU INvy. o4y 


i fill AAAfM/mf V 

gpiuu segment o 


90 nts 


Gvn m no* r*^o 

oily UJ iSlJ* o ju 


r oiypepnae encoaea oy oe\£ ixj inva o4y 


OA MM 

JO aa 


<2FO TD NO- RSI 


cm 1 OO rAompnt Q 

gpiuv/ scgmcni y 


OA 

yu nts 


SEO ID NO- 852 

JJL/ llv. 0*/X> 


PnlvnmtiH^ **nmHftH hv QPO TO NO* R*\1 

JT fk/ijrL/wJJUUw vlltUUCU Ujr OCU JJL/ INvJ. o J 1 


*2A oo 

ju aa 


SEO ID NO* 853 

LJJLalV^ i i/ 1t\/i \JmJ J 


an 100 cparnpTit 10 


00 ntc 

yu nts 


SEO ID NO- 854 


Polvnentide encoded hv SEO TD NO- 8S1 


10 ao 
Jv aa 


SEO ID NO- 855 


otjI 00 sepment 1 1 


00 ntc 


SEO ID NO- 856 


Polvnentide encoded hv SFO TO NO* 

*• \ZlJTjk/w|/lX%*v VUvVAIvU \J J hjLfV^r XL/ 1 ^ V/ . OJJ 


10 OO 


SEQIDNO: 857 


gplOO segment 12 


90 nts 


SEQ ID NO: 858 


Polypeptide encoded by SEQ ID NO: 857 


30 aa 


SEQIDNO: 859 


gplOO segment 13 


90 nts 


SEQ ID NO: 860 


Polypeptide encoded by SEQ ID NO: 859 j 


30 aa 


SEQ ID NO: 861 


gp 100 segment 14 


90 nts 


SEQ ID NO: 862 


Polypeptide encoded by SEQ ID NO: 861 


30 aa 
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SEQIDNO: 863 


gplOO segment 15 


90 nts 


SEQ ID NO: 864 


Polypeptide encoded by SEQ ID NO: 863 


30 aa 


SEQIDNO: 865 


gp 100 segment 16 


90 nts | 


SEQ ID NO: 866 


Polypeptide encoded by SEQ ID NO: 865 


30 aa 


SEQIDNO: 867 


gplOO segment 17 


90 nts 


SEQIDNO: 868 


Polypeptide encoded by SEQ ED NO: 867 


30 aa 


SEQIDNO: 869 


gplOO segment 18 


90 nts 


SEQIDNO: 870 


Polypeptide encoded by SEQ ID NO: 869 


30 aa 


SEQ ID NO: 871 


gplOO segment 19 


90 nts 


SEQ ID NO: 872 


Polypeptide encoded by SEQ ID NO: 871 


30 aa 


SEQIDNO: 873 


gplOO segment 20 


90 nts 


SEQIDNO: 874 


Polypeptide encoded by SEQ ID NO: 873 


30 aa 


SEQIDNO: 875 


gplOO segment 21 


90 nts 


SEQ ID NO: 876 


Polypeptide encoded by SEQ ID NO: 875 


30 aa 


SEQ ID NO: 877 


gplOO segment 22 


90 nts 


SEQ ID NO: 878 


Polypeptide encoded by SEQ ID NO: 877 


30 aa 


SEQ ID NO: 879 


gplOO segment 23 


90 nts 


SEQ ID NO: 880 


Polypeptide encoded by SEQ ID NO: 879 


30 aa 


SEQ ID NO: 881 


gplOO segment 24 


90 nts 


SEQIDNO: 882 


Polypeptide encoded by SEQ ID NO: 881 


30 aa 


SEQIDNO: 883 


gplOO segment 25 


90 nts 


SEQIDNO: 884 


Polypeptide encoded by SEQ ID NO: 883 


30 aa 


SEQIDNO: 885 


gplOO segment 26 


90 nts 


SEQIDNO: 886 


Polypeptide encoded by SEQ ID NO: 885 


30 aa 
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SEQ ID NO: 887 


gp 100 segment 27 


90nts 


SEQ ID NO: 888 


Polypeptide encoded by SEQ ID NO: 887 


30 aa 


SEQ ID NO: 889 


gplOO segment 28 


90 nts 


SEQ ID NO: 890 


Polypeptide encoded by SEQ ID NO: 889 


30 aa 


SEQ ID NO: 891 


gplOO segment 29 


90 nts 


SEQ ID NO: 892 


Polypeptide encoded by SEQ ID NO: 891 


30 aa 


SEQ ID NO: 893 


gp 100 segment 30 


90 nts 


SEQ ID NO: 894 


Polypeptide encoded by SEQ ID NO: 893 


30 aa 


SEQ ID NO: 895 


gplOO segment 31 


90 nts 


SEQ ID NO: 896 


Polypeptide encoded by SEQ ID NO: 895 


30 aa 


SEQ ID NO: 897 


gplOO segment 32 


90 nts 


SEQ ID NO: 898 


Polypeptide encoded by SEQ ID NO: 897 


30 aa 


SEQ ID NO: 899 


gplOO segment 33 


90 nts 


SEQ ID NO: 900 


Polypeptide encoded by SEQ ID NO: 899 


30 aa 


SEQ ID NO: 901 


Q)100 segment 34 


90 nts 


SEQ ID NO: 902 


Polypeptide encoded by SEQ ID NO: 901 


30 aa 


OTA TIN \T/\ /\/V> 

SEQ ID NO: 903 


gplOO segment 35 


90 nts 


SEQ ID NO: 904 


T\ 1 A. * -1 _f % ■% flTl/X life "V 

Polypeptide encoded by SEQ ID NO: 903 


30 aa 


obQ ID NO: VU5 


gplOU segment 3o 


90 nts 


SEQ ID NO: 906 


Polypeptide encoded by SEQ ID NO: 905 


30 aa 


SEQ ID NO: 907 


gplOO segment 37 


90 nts 


SEQ ID NO: 908 


Polypeptide encoded by SEQ ID NO: 907 


30 aa 


SEQ ID NO: 909 


gplOO segment 38 


90 nts 


SEQ ID NO: 910 


Polypeptide encoded by SEQ ID NO: 909 


30 aa 
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on/A tta rti i 

SEQIDNO: 911 


Q)100 segment 39 


90nts 


SEQIDNO: 912 


Polypeptide encoded by SEQ ID NO: 91 1 


30 aa 


SEQIDNO: 913 


gplOO segment 40 


90nts 


SEQ ID NO: 914 


Polypeptide encoded by SEQ ID NO: 913 


30 aa 


SEQIDNO: 915 


gp 100 segment 41 


90nts 


SEQIDNO: 916 


Polypeptide encoded by SEQ ID NO: 915 


30 aa 


SEQ ID NO: 917 


gplOO segment 42 


90 nts 


SEQIDNO: 918 


Polypeptide encoded by SEQ ID NO: 917 


30 aa 


SEQIDNO: 919 


S>100 segment 43 


90 nts 


SEQ ID NO: 920 


Polypeptide encoded by SEQ ID NO: 919 


30 aa 


SEQ ID NO: 921 


0)100 segment 44 


60nts 


SEQ ID NO: 922 


Polypeptide encoded by SEQ ID NO: 921 


20 aa 


SEQ ID NO: 923 


MART segment 1 


90 nts 


SEQ ID NO: 924 


Polypeptide encoded by SEQ ID NO: 923 


30 aa 


SEQ ID NO: 925 


MART segment 2 


90 nts 


SEQ ID NO: 926 


Polypeptide encoded by SEQ ID NO: 925 


30 aa 


SEQ ID NO: 927 


MART segment 3 


90 nts 


SEQ ID NO: 928 


T* 1 a. - J J It fttirv TTX X. T^"X /\ 

Polypeptide encoded by SEQ ID NO: 927 


30 aa 


sfcQ id nu. vzy 


maki segment 4 


90 nts 


SEQ ID NO: 930 


Polypeptide encoded by SEQ ID NO: 929 


30 aa 


SEQ ID NO: 931 


MART segment 5 


90 nts 


SEQ ID NO: 932 


Polypeptide encoded by SEQ ID NO: 931 


30 aa 


SEQ ID NO: 933 


MART segment 6 


90 nts 1 


SEQIDNO: 934 


Polypeptide encoded by SEQ ID NO: 933 


30 aa 
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MAKi segment / 


90nts 


SEQ ID NO: 936 


Polypeptide encoded by SEQ ID NO: 935 


30 aa 


OPA TT> VTA. rt*>*T 

SEQ ID NO: 937 


x M A TV T* - - - a o 

MART segment 8 


51 nts 


SEQ ID NO: 938 


Polypeptide encoded by SEQ ID NO: 937 


17 aa 


SEQ ID NO: 939 


trp-1 segment 1 


90 nts 


SEQ ID NO: 940 


Polypeptide encoded by SEQ ID NO: 939 


30 aa 


SEQ ID NO: 941 


trp-1 segment 2 


90 nts 


SEQ ID NO: 942 


Polypeptide encoded by SEQ ID NO: 941 


30 aa 


SEQ ID NO: 943 


trp-1 segment 3 


90 nts 


SEQ ID NO: 944 


Polypeptide encoded by SEQ ID NO: 943 


30 aa 


SEQ ID NO: 945 


trp-1 segment 4 


90 nts 


SEQ ID NO: 946 


Polypeptide encoded by SEQ ID NO: 945 


30 aa 


SEQ ID NO: 947 


trp-1 segment 5 


90 nts 


OT^^"V TTX \fA A JO 

SEQ ID NO: 948 


TV 1 . » « J J 1 OTA TTX x fyx r\ 

Polypeptide encoded by SEQ ID NO: 947 


30 aa 


SEQ ID NO: 949 


trp-1 segment 6 


/V/V JL 

90 nts 


CT2/% TT% XT/%. OCA 

SEQ ID NO: 950 


TV-t a? J _ r._. ,. .. .1.. ,1 l__ . OTir\ TTX VTA. AJA 

Polypeptide encoded by SEQ ID NO: 949 


*v/\ _ 

30 aa 


OT7/"% TT\ XT/%. AC 1 

5>EQ ID NO: 95 1 


trp-1 segment 7 


A A *_ 

90 nts 


OT?/% TT% XT/%. rtCO 

abQ ID NU: 952 


DaIimm^/Ia _ _ J _j . , CTT/% TT% XT/%. AC 1 

.Polypeptide encoded by 0I1Q ID NO: 951 


1A ~~ 

30 aa 


CTJA TT\ XT/V 0^1 


tit \_ 1 carrrrt Ant V 

trp-l segment 8 ^.^ 


on *if(* 
yu nts 


SEQ ID NO: 954 


Polypeptide encoded by SEQ ID NO: 953 


30 aa 


SEQ ID NO: 955 


trp-1 segment 9 


90 nts 


SEQ ID NO: 956 


Polypeptide encoded by SEQ ID NO: 955 


30 aa 


SEQ ID NO: 957 


trp-1 segment 10 


90 nts 


SEQ ID NO: 958 


Polypeptide encoded by SEQ ID NO: 957 


30 aa 
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1 * , .'77. "f'J. 




! 
i 


bEQ ID NU: 955J 


trp-1 segment 11 


90 nts 


SEQ ID NO: 960 


Polypeptide encoded by SEQ ID NO: 959 


30 aa 


SEQ ID NO: 961 


trp-1 segment 12 


90 nts 


SEQ ID NO: 962 


Polypeptide encoded by SEQ ID NO: 961 


30 aa 


SEQ ED NO: 963 


trp-1 segment 13 


90 nts 


SEQ ID NO: 964 


Polypeptide encoded by SEQ ID NO: 963 


30 aa 


SEQ ID NO: 965 


tip-1 segment 14 


90 nts 


SEQ ID NO: 966 


Polypeptide encoded by SEQ ID NO: 965 


30 aa 


SEQ ID NO: 967 


trp-1 segment 15 


90 nts 


SEQ ID NO: 968 


Polypeptide encoded by SEQ ID NO: 967 


30 aa 


SEQ ID NO: 969 


trp-1 segment 16 


81 nts 


SEQ ID NO: 970 


Polypeptide encoded by SEQ ID NO: 969 


27 aa 


SEQ ID NO: 971 


tyros segment 1 


90 nts 


SEQ ID NO: 972 


Polypeptide encoded by SEQ ID NO: 971 


30 aa 


SEQ ID NO: 973 


tyros segment 2 


90 nts 


SEQ ID NO: 974 


Polypeptide encoded by SEQ ID NO: 973 


30 aa 


SEQ ID NO: 975 


tyros segment 3 


90 nts 


OT5A nrx vta. nn^ 

SEQ ID NO: 976 


Polypeptide encoded by SEQ ID NO: 975 


30 aa 


Qxjr\ rr\ xjfV C7*7 
orSv^ JNU. 7 / / 


ryros segmeni *» 


90 nts 


SEQ ID NO: 978 


Polypeptide encoded by SEQ ID NO: 977 


30 aa 


SEQ ID NO: 979 


tyros segment 5 


90 nts 


SEQ ID NO: 980 


Polypeptide encoded by SEQ ID NO: 979 j 


30 aa 


SEQ ID NO: 981 


tyros segment 6 


90 nts 


SEQ ID NO: 982 


Polypeptide encoded by SEQ ID NO: 98 1 ! 


30 aa 
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SEQ ID NO: 983 


tyros segment 7 


90 nts 


SEQ ID NO: 984 


Polypeptide encoded by SEQ ID NO: 983 


30 aa 


SEQ ED NO: 985 


tyros segment 8 


90 nts 


SEQ ID NO: 986 


Polypeptide encoded by SEQ ID NO: 985 


30 aa 


SEQ ID NO: 987 


tyros segment 9 


90 nts 


SEQ ID NO: 988 


Polypeptide encoded by SEQ ID NO: 987 


30 aa 


SEQ ID NO: 989 


tyros segment 10 


90 nts 


SEQ ID NO: 990 . 


Polypeptide encoded by SEQ ID NO: 989 


30 aa 


SEQ ID NO: 991 


tyros segment 1 1 


90 nts 


SEQ ID NO: 992 


Polypeptide encoded by SEQ ID NO: 991 


30 aa 


SEQ ID NO: 993 


tyros segment 12 


90 nts 


SEQ ID NO: 994 


Polypeptide encoded by SEQ ID NO: 993 


30 aa 


SEQ ID NO: 995 


tyros segment 13 


90 nts 


SEQ ID NO: 996 


Polypeptide encoded by SEQ ID NO: 995 


30 aa 


SEQ ID NO: 997 


tyros segment 14 


90 nts 


SEQ ID NO: 998 


Polypeptide encoded by SEQ ID NO: 997 


30 aa 


SEQ ID NO: 999 


tyros segment 15 


90 nts 


SEQ ID NO: 1000 


Polypeptide encoded by SEQ H) NO: 999 


30 aa 


SEQ ID NO: 1U01 


tyros segment 16 


90 nts 


SEQ ID NO: 1002 


Polypeptide encoded by SEQ ID NO: 1 001 


30 aa 


SEQ ID NO: 1003 


tyros segment 17 


90 nts 


SEQ ID NO: 1004 


Polypeptide encoded by SEQ ID NO: 1003 


30 aa 


SEQ ID NO: 1005 


tyros segment 18 


90 nts 


SEQ ID NO: 1006 


Polypeptide encoded by SEQ ID NO: 1005 


30 aa 
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SEQ BO NO: 1007 


tyros segment 19 


90nts 


SEQIDNO:1008 


Polypeptide encoded by SEQ ID NO: 1007 


30 aa 


SEQ ID NO: 1009 


tyros segment 20 


90 nts 


SEQ ID NO: 1010 


Polypeptide encoded by SEQ ID NO: 1009 


30 aa 


SEQ ID NO: 1011 


tyros segment 21 


90 nts 


SEQ ID NO: 1012 


Polypeptide encoded by SEQ ID NO: 1011 


30 aa 


SEQ ID NO: 1013 


tyros segment 22 


90 nts 


SEQ ID NO: 1014 


Polypeptide encoded by SEQ ID NO: 1013 


30 aa 


SEQ ID NO: 1015 


tyros segment 23 


90 nts 


SEQ ID NO: 1016 


Polypeptide encoded by SEQ ID NO: 1015 


30 aa 


SEQ ID NO: 1017 


tyros segment 24 


90 nts 


SEQ ID NO: 1018 


Polypeptide encoded by SEQ ID NO: 1017 


30 aa 


SEQ ID NO: 1019 


tyros segment 25 


90 nts 


SEQ ID NO: 1020 


Polypeptide encoded by SEQ ID NO: 1019 


30 aa 


SEQ ID NO: 1021 


tyros segment 26 


90 nts 


SEQ ID NO: 1022 


Polypeptide encoded by SEQ ID NO: 1021 


30 aa 


SEQ ID NO: 1023 


tyros segment 27 


90 nts 


SEQ ID NO: 1024 


Polypeptide encoded by SEQ ID NO: 1023 


30 aa 


SEQ ID NO: 1025 


tyros segment 28 


90 nts 


SEQ ID NO: 1026 


Polypeptide encoded by SEQ ID NO: 1025 


30 aa 


SEQ ID NO: 1027 


tyros segment 29 


90 nts 


SEQ ID NO: 1028 


Polypeptide encoded by SEQ ID NO: 1 027 


30 aa 


SEQ ID NO: 1029 


tyros segment 30 


90 nts 


SEQ ID NO: 1030 


Polypeptide encoded by SEQ ID NO: 1029 


30 aa 
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SEQIDNO: 1031 


tyros segment 31 


90 nts 


SEQIDNO: 1032 


Polypeptide encoded by SEQ ID NO: 1031 


30 aa 


SEQIDNO: 1033 


tyros segment 32 


90 nts 


SEQIDNO: 1034 


Polypeptide encoded by SEQ ID NO: 1033 


30 aa 


SEQIDNO: 1035 


tyros segment 33 


90 nts 


SEQIDNO: 1036 


Polypeptide encoded by SEQ ID NO: 1035 


30 aa 


SEQIDNO: 1037 


tyros segment 34 


90 nts 


SEQIDNO: 1038 


Polypeptide encoded by SEQ ID NO: 1037 


30 aa 


SEQIDNO: 1039 


tyros segment 35 


69 nts 


SEQIDNO: 1040 


Polypeptide encoded by SEQ ID NO: 1039 


23 aa 


SEQIDNO: 1041 


trp2 segment 1 


90 nts 


SEQIDNO: 1042 


Polypeptide encoded by SEQ ID NO: 1041 


30 aa 


SEQIDNO: 1043 


trp2 segment 2 


90 nts 


SEQIDNO: 1044 


Polypeptide encoded by SEQ ID NO: 1043 


30 aa 


SEQIDNO: 1045 


trp2 segment 3 


90 nts 


SEQIDNO: 1046 


Polypeptide encoded by SEQ ID NO: 1045 


30 aa 


SEQIDNO: 1047 


trp2 segment 4 


90 nts 


SEQIDNO: 1048 


Polypeptide encoded by SEQ ID NO: 1047 


30 aa 


SEQIDNO: 1049 


trp2 segment 5 


90 nts 


SEQIDNO: 1050 


Polypeptide encoded by SEQ ID NO: 1049 


30 aa 


SEQIDNO: 1051 


trp2 segment 6 


90 nts 


SEQIDNO: 1052 


Polypeptide encoded by SEQ ID NO: 1051 


30 aa 


SEQIDNO: 1053 


trp2 segment 7 


90 nts 


SEQIDNO: 1054 


Polypeptide encoded by SEQ ID NO: 1053 


30 aa 



WO 01/090197 



PCT/AU01/00622 



61 



SEQIDNO: 1055 


trp2 segment 8 


90nts 


SEQ ID NO: 1056 


Polypeptide encoded by SEQ ID NO: 1 055 


30 aa 


SEQIDNO: 1057 


trp2 segment 9 


90nts 


SEQIDNO: 1058 


Polypeptide encoded by SEQ ID NO: 1057 


30 aa 


SEQIDNO: 1059 


tip2 segment 10 


90 nts 


SEQIDNO: 1060 


Polypeptide encoded by SEQ ED NO: 1059 


30 aa 


SEQIDNO: 1061 


trp2 segment 1 1 


90 nts 


SEQ ID NO: 1062 


Polypeptide encoded by SEQ ID NO: 1061 


30 aa 


SEQIDNO: 1063 


trp2 segment 12 


90 nts 


SEQIDNO: 1064 


Polypeptide encoded by SEQ ID NO: 1063 


30 aa 


SEQIDNO: 1065 


trp2 segment 13 


90 nts 


SEQIDNO: 1066 


Polypeptide encoded by SEQ ID NO: 1065 


30 aa 


SEQIDNO: 1067 


trp2 segment 14 


90 nts 


SEQ ID NO: 1068 


Polypeptide encoded by SEQ ID NO: 1067 


30 aa 


SEQ ID NO: 1069 


trp2 segment 15 


90 nts 


SEQ ID NO: 1070 


Polypeptide encoded by SEQ ID NO: 1069 


30 aa 


SEQ ID NO: 1071 


trp2 segment 16 


90 nts 


SEQ ID NO: 1072 


Polypeptide encoded by SEQ ID NO: 1071 


30 aa 


SEQ ID NO: 1073 


trp2 segment 17 


90 nts 


SEQIDNO: 1074 


Polypeptide encoded by SEQ ID NO: 1073 


30 aa 


SEQIDNO: 1075 


trp2 segment 18 


90 nts 


SEQIDNO: 1076 


Polypeptide encoded by SEQ ID NO: 1075 


30 aa 


SEQIDNO: 1077 


tip2 segment 19 


90 nts 


SEQIDNO: 1078 


Polypeptide encoded by SEQ ID NO: 1077 


30 aa 
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SEQ ID NO: 1079 


trp2 segment 20 


90nts 


SEQIDNO:1080 


Polypeptide encoded by SEQ ID NO: 1079 


30 aa 


SEQ ID NO: 1081 


tip2 segment 21 


90 nts 


SEQ ID NO: 1082 


Polypeptide encoded by SEQ ID NO: 1081 


30 aa 


SEQ ID NO: 1083 


trp2 segment 22 


90 nts 


SEQ ID NO: 1084 


Polypeptide encoded by SEQ ID NO: 1083 


30 aa 


SEQ ID NO: 1085 


trp2 segment 23 


90 nts 


SEQ ID NO: 1086 


Polypeptide encoded by SEQ ID NO: 1085 


30 aa 


SEQ ID NO: 1087 


trp2 segment 24 


90 nts 


SEQ ID NO: 1088 


Polypeptide encoded by SEQ ID NO: 1087 


30 aa 


SEQ ID NO: 1089 


trp2 segment 25 


90 nts 


SEQ ID NO: 1090 


Polypeptide encoded by SEQ ID NO: 1089 


30 aa 


SEQ ID NO: 1091 


tip2 segment 26 


90 nts 


SEQ ID NO: 1092 


Polypeptide encoded by SEQ ID NO: 1091 


30 aa 


SEQ ID NO: 1093 


trp2 segment 27 


90 nts 


SEQ ID NO: 1094 


Polypeptide encoded by SEQ ID NO: 1093 


30 aa 


SEQ ID NO: 1095 


trp2 segment 28 


90 nts 


SEQ ID NO: 1096 


Polypeptide encoded by SEQ ID NO: 1095 


30 aa 


SEQ ID NO: 1097 


trp2 segment 29 


90 nts 


SEQ ID NO: 1098 


Polypeptide encoded by SEQ ID NO: 1097 


30 aa 


SEQ ID NO: 1099 


trp2 segment 30 


90 nts 


SEQ ID NO: 1100 


Polypeptide encoded by SEQ ID NO: 1099 


30 aa 


SEQ ID NO: 1101 


trp2 segment 31 


90 nts 


SEQ ID NO: 1102 


Polypeptide encoded by SEQ ID NO: 1 101 


30 aa 
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SEQIDNO:1103 


trp2 segment 32 


90nts 


SEQIDNO: 1104 


Polypeptide encoded by SEQ ID NO: 1 103 


30 aa 


SEQIDNO: 1105 


trp2 segment 33 


90 nts 


SEQIDNO: 1106 


Polypeptide encoded by SEQ ID NO: 1 105 


30 aa 


SEQIDNO: 1107 


trp2 segment 34 


84 nts 


SEQIDNO: 1108 


Polypeptide encoded by SEQ ID NO: 1 107 


28 aa 


SEQIDNO: 1109 


MCI R segment 1 


90 nts 


SEQIDNO: 1110 


Polypeptide encoded by SEQ ID NO: 1 109 


30 aa 


SEQIDNO: 1111 


MC1R segment 2 


90 nts 


SEQIDNO: 1112 


Polypeptide encoded by SEQ ID NO: 1111 


30 aa 


SEQIDNO: 1113 


MC1R segment 3 


90 nts 


SEQIDNO: 1114 


Polypeptide encoded by SEQ ID NO: 1 1 13 


30 aa 


SEQIDNO: 1115 


MC1R segment 4 


90 nts ! 


SEQIDNO: 1116 


Polypeptide encoded by SEQ ID NO: 1115 j 


30 aa 


SEQIDNO: 1117 


MC1R segment 5 


90 nts 


SEQIDNO: 1118 


Polypeptide encoded by SEQ ID NO: 1117 


30 aa 


SEQIDNO: 1119 


MC1R segment 6 


90 nts 


SEQIDNO: 1120 


Polypeptide encoded by SEQ ID NO: 1 1 19 


30 aa 


SEQIDNO: 1121 


MC1R segment 7 


90 nts 


SEQIDNO: 1122 


Polypeptide encoded by SEQ ID NO: 1121 


30 aa 


SEQIDNO: 1123 


MC1R segment 8 


90 nts 


SEQIDNO: 1124 


Polypeptide encoded by SEQ ID NO: 1 123 


30 aa 


SEQIDNO: 1125 


MC1R segment 9 


90 nts 


SEQIDNO: 1126 


Polypeptide encoded by SEQ ID NO: 1 125 


30 aa 
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SEQIDNO: 1127 


MCI R segment 10 


90 nts 


SEQIDNO: 1128 


Polypeptide encoded by SEQ ID NO: 1 127 


30 aa 


SEQIDNO: 1129 


MC1R segment 11 


90 nts 


SEQIDNO: 1130 


Polypeptide encoded by SEQ ID NO: 1 129 


30 aa 


SEQIDNO: 1131 


MC1R segment 12 


90 nts 


SEQIDNO: 1132 


Polypeptide encoded by SEQ ID NO: 1 1 3 1 


30 aa 


SEQIDNO: 1133 


MC1R segment 13 


90 nts 


SEQIDNO: 1134 


Polypeptide encoded by SEQ ID NO: 1 133 


30 aa 


SEQIDNO: 1135 


MC1R segment 14 


90 nts 


SEQIDNO: 1136 


Polypeptide encoded by SEQ ID NO: 1 135 


30 aa 


SEQIDNO: 1137 


MC1R segment 15 ; 


90 nts 


SEQIDNO: 1138 


Polypeptide encoded by SEQ ID NO: 1 137 


30 aa 


SEQIDNO: 1139 


MC1R segment 16 


90 nts 


SEQIDNO: 1140 


Polypeptide encoded by SEQ ID NO: 1 139 


30 aa 


SEQIDNO: 1141 


MC1R segment 17 


90 nts 


SEQIDNO: 1142 


Polypeptide encoded by SEQ ID NO: 1 141 


30 aa 


SEQIDNO: 1143 


MC1R segment 18 


90 nts 


SEQIDNO: 1144 


Polypeptide encoded by SEQ ID NO: 1 143 


30 aa 


SEQIDNO: 1145 


MC1R segment 19 


90 nts 


SEQIDNO: 1146 


Polypeptide encoded by SEQ ID NO: 1 145 


30 aa 


SEQIDNO: 1147 


MC1R segment 20 


90 nts 


SEQIDNO: 1148 


Polypeptide encoded by SEQ ID NO: 1 147 


30 aa 


SEQIDNO: 1149 


MC1R segment 21 


63 nts 


SEQIDNO: 1150 


Polypeptide encoded by SEQ ID NO: 1 149 


21 aa 
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SEQIDNO:1151 


MUCIF segment 1 


90 nts 


SEQIDNO:1152 


Polypeptide encoded by SEQ ID NO: 1151 


30 aa 


SEQIDNO:1153 


MUCIF segment 2 


90 nts 


SEQIDNO: 1154 


Polypeptide encoded by SEQ ID NO: 1 153 


30 aa 


SEQIDNO:1155 


MUCIF segment 3 


90 nts 


SEQIDNO: 1156 


Polypeptide encoded by SEQ ID NO: 1 1 55 


30 aa 


SEQIDNO:1157 


MUCIF segment 4 


90 nts 


SEQIDNO: 1158 


Polypeptide encoded by SEQ ID NO: 1157 


30 aa 


SEQIDNO: 1159 


MUCIF segment 5 


90 nts 


SEQIDNO: 1160 


Polypeptide encoded by SEQ ID NO: 1 159 


30 aa 


SEQIDNO: 1161 


MUCIF segment 6 


90 nts 


SEQIDNO: 1162 


Polypeptide encoded by SEQ ID NO: 1 161 


30 aa 


SEQIDNO: 1163 


MUCIF segment 7 


90 nts 


SEQIDNO: 1164 


Polypeptide encoded by SEQ ID NO: 1 163 


30 aa 


SEQIDNO: 1165 


MUCIF segment 8 


72 nts 


SEQIDNO: 1166 


Polypeptide encoded by SEQ ID NO: 1 165 


24 aa 


SEQIDNO: 1167 


MUC1R segment 1 


90 nts 


SEQIDNO: 1168 


Polypeptide encoded by SEQ ID NO: 1 167 


30 aa 


SEQIDNO: 1169 


MUC1R segment 2 


90 nts 


SEQIDNO: 1170 


Polypeptide encoded by SEQ ID NO: 1 169 


30 aa 


SEQIDNO: 1171 


MUC1R segment 3 


90 nts 


SEQIDNO: 1172 


Polypeptide encoded by SEQ ID NO: 1171 


30 aa 


SEQIDNO: 1173 


MUC1R segment 4 


90 nts | 


SEQIDNO: 1174 


Polypeptide encoded by SEQ ID NO: 1 1 73 


30 aa 
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'! - /— ' 

i! i 


SEQIDNO: 1175 


MUC1R segment 5 


90 nts 


SEQIDNO: 1176 


Polypeptide encoded by SEQ ID NO: 1 175 


30 aa 


SEQIDNO: 1177 


MUC1R segment 6 


90 nts 


SEQ ID NO: 1178 


Polypeptide encoded by SEQ ID NO: 1 177 


30 aa 


SEQIDNO: 1179 


MUC1R segment 7 


90 nts 


SEQIDNO: 1180 


Polypeptide encoded by SEQ ID NO: 1 179 


30 aa 


SEQIDNO: 1181 


MUC1R segment 8 


90 nts 


SEQIDNO: 1182 


Polypeptide encoded by SEQ ID NO: 1 181 


30 aa 


SEQIDNO: 1183 


MUC1R segment 9 


90 nts 


SEQIDNO: 1184 


Polypeptide encoded by SEQ ID NO: 1 183 


30 aa 


SEQIDNO: 1185 


MUC1R segment 10 


90 nts 


SEQIDNO: 1186 


Polypeptide encoded by SEQ ID NO: 1 185 


30 aa 


SEQIDNO: 1187 


MUC1R segment 11 


90 nts 


SEQIDNO: 1188 


Polypeptide encoded by SEQ ID NO: 1 1 87 


30 aa 


SEQIDNO: 1189 


MUC1R segment 12 


90 nts 


SEQIDNO: 1190 


Polypeptide encoded by SEQ ID NO: 1 189 


30 aa 


SEQIDNO: 1191 


MUC1R segment 13 


90 nts 


SEQIDNO: 1192 


Polypeptide encoded by SEQ ID NO: 1 191 


30 aa 


SEQIDNO: 1193 


MUC1R segment 14 


90 nts 


SEQIDNO: 1194 


Polypeptide encoded by SEQ ID NO: 1 193 


30 aa 


SEQIDNO: 1195 


MUC1R segment 15 


90 nts 


SEQIDNO: 1196 


Polypeptide encoded by SEQ ID NO: 1 195 


30 aa 


SEQIDNO: 1197 


MUC1R segment 16 


90 nts 


SEQ ID NO: 1198 


Polypeptide encoded by SEQ ID NO: 1 197 


30 aa 
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SEQIDNO:1199 


MUCIR segment 17 


90 nts 


SEQ ID NO: 1200 


Polypeptide encoded by SEQ ID NO: 1 199 


30 aa 


SEQIDNO:1201 


MUCIR segment 18 


90 nts 


SEQ ID NO: 1202 


Polypeptide encoded by SEQ ID NO: 1201 


30 aa 


SEQ ED NO: 1203 


MUCIR segment 19 


90 nts 


SEQ ID NO: 1204 


Polypeptide encoded by SEQ ID NO: 1203 


30 aa 


SEQ ID NO: 1205 


MUCIR segment 20 


90 nts 


SEQ ID NO: 1206 


Polypeptide encoded by SEQ ID NO: 1205 


30 aa 


SEQ ID NO: 1207 


MUCIR segment 21 


48 nts 


SEQ ID NO: 1208 


Polypeptide encoded by SEQ ID NO: 1207 


16 aa 


SEQ ID NO: 1209 


Differentiation Savine 


16638 nts 


SEQ ID NO: 1210 


Polypeptide encoded by SEQ ID NO: 1209 


5546 aa 


SEQ ID NO: 1211 


BAGE segment 1 


90 nts 


SEQ ID NO: 1212 


Polypeptide encoded by SEQ ID NO: 1211 


30 aa 


SEQ ID NO: 1213 


BAGE segment 2 


90 nts 


SEQ ID NO: 1214 


Polypeptide encoded by SEQ ID NO: 1213 


30 aa 


SEQ ID NO: 1215 


BAGE segment 3 


51 nts 


SEQ ID NO: 1216 


Polypeptide encoded by SEQ ID NO: 1215 


17 aa 


SEQ ID NO: 1217 


GAGE-1 segment 1 


90 nts 


SEQ ID NO: 1218 


Polypeptide encoded by SEQ ID NO: 1217 


30 aa 


SEQ ID NO: 1219 


GAGE-1 segment 2 


90 nts 


SEQ ID NO: 1220 


Polypeptide encoded by SEQ ID NO: 1219 


30 aa 


SEQ ID NO: 1221 


GAGE-1 segment 3 


90 nts 


SEQ ID NO: 1222 


Polypeptide encoded by SEQ ID NO: 1221 


30 aa 



WO 01/090197 



PCT/AU01/00622 



68- 




5>bQ ID NO: 1223 


uAUri-l segment 4 


90 nts 


/-1T7 A"\ TT\ VIA 1 it 

SEQ ID NO: 1224 


Polypeptide encoded by SEQ ID NO: 1223 


30 aa 


SEQ ID NO: 1225 


GAGE-1 segment 5 


90 nts 


SEQ ID NO: 1226 


Polypeptide encoded by SEQ ID NO: 1225 


30 aa 


SEQ ID NO: 1227 


GAGE-1 segment 6 


90 nts 


SEQ ID NO: 1228 


Polypeptide encoded by SEQ ID NO: 1227 


30 aa 


SEQ ID NO: 1229 


GAGE-1 segment 7 


90 nts 


SEQ ID NO: 1230 


Polypeptide encoded by SEQ ID NO: 1229 


30 aa 


SEQ ID NO: 1231 


GAGE-1 segment 8 


90 nts 


SEQ ID NO: 1232 


Polypeptide encoded by SEQ ID NO: 1231 


30 aa 


SEQ ID NO: 1233 


GAGE-1 segment 9 


66 nts 


SEQ ID NO: 1234 


Polypeptide encoded by SEQ ID NO: 1233 


22 aa 


SEQ ID NO: 1235 


gpl001n4 segment 1 


90 nts 


SEQ ID NO: 1236 


Polypeptide encoded by SEQ ID NO: 1235 


30 aa 


SEQ ID NO: 1237 


gpl001n4 segment 2 


90 nts 


SEQ ID NO: 1238 


Polypeptide encoded by SEQ ID NO: 1 237 


30 aa 


SEQ ID NO: 1239 


gpl001n4 segment 3 


75 nts 


SEQ ID NO: 1240 


TV _ t_ _ J J « _ nriA TT% VTA 1 

Polypeptide encoded by SEQ ID NO: 1239 


25 aa 




MAvjii-i segmem l 


yunts 


SEQ ID NO: 1242 


Polypeptide encoded by SEQ ID NO: 1241 


30 aa 


SEQIDNO: 1243 


MAGE-1 segment 2 


90 nts 


SEQ ID NO: 1244 


Polypeptide encoded by SEQ ID NO: 1243 


30 aa 


SEQIDNO: 1245 


MAGE-1 segment 3 


90 nts 


SEQIDNO: 1246 


Polypeptide encoded by SEQ ID NO: 1245 


30 aa 
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SEQIDNO: 1247 


MAGE-1 segment 4 


90 nts 


SEQIDNO: 1248 


Polypeptide encoded by SEQ ID NO: 1247 


30 aa 


SEQIDNO: 1249 


MAGE-1 segment 5 


90 nts 


SEQIDNO: 1250 


Polypeptide encoded by SEQ ID NO: 1249 


30 aa 


SEQIDNO: 1251 


MAGE-1 segment 6 


90 nts 


SEQIDNO: 1252 


Polypeptide encoded by SEQ ID NO: 1251 


30 aa 


SEQIDNO: 1253 


MAGE-1 segment 7 


90 nts 


SEQIDNO: 1254 


Polypeptide encoded by SEQ ID NO: 1253 


30 aa 


SEQIDNO: 1255 


MAGE-1 segment 8 


90 nts 


SEQIDNO: 1256 


Polypeptide encoded by SEQ ID NO: 1255 


30 aa 


SEQIDNO: 1257 


MAGE-1 segment 9 


90 nts 


SEQIDNO: 1258 


Polypeptide encoded by SEQ ID NO: 1257 


30 aa 


SEQ ID NO: 1259 


MAGE-1 segment 10 


90 nts 


SEQIDNO: 1260 


Polypeptide encoded by SEQ ID NO: 1259 


30 aa 


SEQ ID NO: 1261 


MAGE-1 segment 11 


90 nts 


SEQ ID NO: 1262 


Polypeptide encoded by SEQ ID NO: 1261 


30 aa 


SEQ ID NO: 1263 


MAGE-1 segment 12 


90 nts 


SEQ ID NO: 1264 


Polypeptide encoded by SEQ ID NO: 1263 


30 aa 


SEQIDNO: 1265 


MACjE-1 segment 13 


90 nts 


SEQIDNO: 1266 


Polypeptide encoded by SEQ ID NO: 1265 


30 aa 


SEQIDNO: 1267 


MAGE-1 segment 14 


90 nts 

• 


SEQIDNO: 1268 


Polypeptide encoded by SEQ ID NO: 1267 


30 aa 


SEQIDNO: 1269 


MAGE-1 segment 15 


90 nts 


SEQIDNO: 1270 


Polypeptide encoded by SEQ ID NO: 1269 


30 aa 
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SEQ ID NO: 1271 


MAGE-1 segment 16 


90nts 


SEQ ID NO: 1272 


Polypeptide encoded by SEQ ID NO: 1271 


30 aa 


SEQ ID NO: 1273 


MAGE-1 segment 17 


90 nts 


SEQ ID NO: 1274 


Polypeptide encoded by SEQ ID NO: 1273 


30 aa 


SEQ ID NO: 1275 


MAGE-1 segment 18 


90 nts 


SEQ ID NO: 1276 


Polypeptide encoded by SEQ ID NO: 1275 


30 aa 


SEQ ID NO: 1277 


MAGE-1 segment 19 


90 nts 


SEQ ID NO: 1278 


Polypeptide encoded by SEQ ID NO: 1277 


30 aa 


SEQ ID NO: 1279 


MAGE-1 segment 20 


84 nts 


SEQ ID NO: 1280 


Polypeptide encoded by SEQ ID NO: 1279 


28 aa 


SEQIDNO: 1281 


MAGE-3 segment 1 


90 nts 


SEQ ID NO: 1282 


Polypeptide encoded by SEQ ID NO: 1281 


30 aa 


SEQ ID NO: 1283 


MAGE-3 segment 2 


90 nts 


SEQ ID NO: 1284 


Polypeptide encoded by SEQ ID NO: 1283 


30 aa 


SEQ ID NO: 1285 


MAGE-3 segment 3 


90 nts 


SEQIDNO: 1286 


Polypeptide encoded by SEQ ID NO: 1285 


30 aa 


SEQ ID NO: 1287 


MAGE-3 segment 4 


f\f\ A _ 

90 nts 


SEQ ID NO: 1288 


Polypeptide encoded by SEQ ID NO: 1287 


30 aa 


d.bQ ID MU. Izoy 


MALrxi-J segment j 


y\) nts 


SEQIDNO: 1290 


Polypeptide encoded by SEQ ID NO: 1 289 


30 aa 


SEQIDNO: 1291 


MAGE-3 segment 6 


90 nts 


SEQIDNO: 1292 


Polypeptide encoded by SEQ ID NO: 1291 


30 aa 


SEQIDNO: 1293 


MAGE-3 segment 7 


90 nts 


SEQIDNO: 1294 


Polypeptide encoded by SEQ ID NO: 1293 


30 aa 



WO 01/090197 



PCT/AU01/00622 



SEQ ID NO: 1295 


MAGE-3 segment 8 


90 nts 


SEQIDNO: 1296 


Polypeptide encoded by SEQ ID NO: 1295 


30 aa 


SEQ ID NO: 1297 


MAGE-3 segment 9 


90 nts 


SEQ ID NO: 1298 


Polypeptide encoded by SEQ ID NO: 1297 


30 aa 


SEQIDNO: 1299 


MAGE-3 segment 10 


90 nts 


SEQ ID NO: 1300 


Polypeptide encoded by SEQ ID NO: 1299 


30 aa 


SEQIDNO: 1301 


MAGE-3 segment 11 


90 nts 


SEQIDNO: 1302 


Polypeptide encoded by SEQ ID NO: 1301 


30 aa 


SEQIDNO: 1303 


MAGE-3 segment 12 


90 nts 


SEQIDNO: 1304 


Polypeptide encoded by SEQ ID NO: 1 303 


30 aa 


SEQIDNO: 1305 


MAGE-3 segment 13 


90 nts 


SEQIDNO: 1306 


Polypeptide encoded by SEQ ID NO: 1305 


30 aa 


SEQIDNO: 1307 


MAGE-3 segment 14 


90 nts 


SEQIDNO: 1308 


Polypeptide encoded by SEQ ID NO: 1307 


30 aa 


SEQIDNO: 1309 


MAGE-3 segment 15 


90 nts 


SEQIDNO: 1310 


Polypeptide encoded by SEQ ID NO: 1309 


30 aa 


SEQIDNO: 1311 


MAGE-3 segment 16 


90 nts 


SEQIDNO: 1312 


Polypeptide encoded by SEQ ID NO: 1311 


30 aa 


SEQIDNO: 1313 


MAGE-3 segment 17 


90 nts 


SEQ ID NO: 1314 


Polypeptide encoded by SEQ ID NO: 1313 


30 aa 


SEQIDNO: 1315 


MAGE-3 segment 18 


90 nts 


SEQIDNO: 1316 


Polypeptide encoded by SEQ ID NO: 1315 


30 aa 


SEQIDNO: 1317 


MAGE-3 segment 19 


90 nts 


SEQIDNO: 1318 


Polypeptide encoded by SEQ ID NO: 1317 


30 aa 
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SEQIDNO: 1319 


MAGE-3 segment 20 


90 nts 


SEQ ID NO: 1320 


Polypeptide encoded by SEQ ID NO: 1319 


30 aa 


SEQIDNO: 1321 


MAGE-3 segment 21 


54 nts 


SEQIDNO: 1322 


Polypeptide encoded by SEQ ID NO: 1321 


18 aa 


SEQIDNO: 1323 


PRAME segment 1 


90 nts 


SEQIDNO: 1324 


Polypeptide encoded by SEQ ID NO: 1323 


30 aa 


SEQIDNO: 1325 


PRAME segment 2 


90 nts | 


SEQ ID NO: 1326 


Polypeptide encoded by SEQ ID NO: 1325 


30 aa 


SEQIDNO: 1327 


PRAME segment 3 


90 nts 


SEQIDNO: 1328 


Polypeptide encoded by SEQ ID NO: 1327 


30 aa 


SEQIDNO: 1329 


PRAME segment 4 


90 nts 


SEQIDNO: 1330 


Polypeptide encoded by SEQ ID NO: 1 329 


30 aa 


SEQIDNO: 1331 


PRAME segment 5 


90 nts 


SEQIDNO: 1332 


Polypeptide encoded by SEQ ID NO: 1331 


30 aa 


SEQIDNO: 1333 


PRAME segment 6 


90 nts 


SEQIDNO: 1334 


Polypeptide encoded by SEQ ID NO: 1333 


30 aa 


SEQIDNO: 1335 


PRAME segment 7 


90 nts 


SEQIDNO: 1336 


Polypeptide encoded by SEQ ID NO: 1335 


30 aa 


SEQIDNO: 1337 


PRAME segment 8 


90 nts 


SEQIDNO: 1338 


Polypeptide encoded by SEQ ID NO: 1337 


30 aa 


SEQIDNO: 1339 


PRAME segment 9 


90 nts 


SEQIDNO: 1340 


Polypeptide encoded by SEQ ID NO: 1339 


30 aa 


SEQIDNO: 1341 


PRAME segment 10 


90 nts 


SEQIDNO: 1342 


P lypeptide encoded by SEQ ID NO: 1341 


30 aa 
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SEQIDNO: 1343 


PRAME segment 11 


90nts 


SEQIDNO: 1344 


Polypeptide encoded by SEQ ID NO: 1343 


30 aa 


SEQIDNO: 1345 


PRAME segment 12 


90 nts 


SEQIDNO: 1346 


Polypeptide encoded by SEQ ID NO: 1 345 


30 aa 


SEQIDNO: 1347 


PRAME segment 13 


90 nts 


SEQIDNO: 1348 


Polypeptide encoded by SEQ ID NO: 1347 


30 aa 


SEQIDNO: 1349 


PRAME segment 14 


90 nts 


SEQIDNO: 1350 


Polypeptide encoded by SEQ ID NO: 1349 


30 aa 


SEQIDNO: 1351 


PRAME segment 15 


90 nts 


SEQIDNO: 1352 


Polypeptide encoded by SEQ ID NO: 135 1 


30 aa 


SEQIDNO: 1353 


PRAME segment 16 


90 nts 


SEQIDNO: 1354 


Polypeptide encoded by SEQ ID NO: 1353 


30 aa 


SEQIDNO: 1355 


PRAME segment 17 


90 nts 


SEQIDNO: 1356 


Polypeptide encoded by SEQ ID NO: 1355 


30 aa 


SEQIDNO: 1357 


PRAME segment 18 


90 nts 


SEQIDNO: 1358 


Polypeptide encoded by SEQ ID NO: 1357 


30 aa 


SEQIDNO: 1359 


PRAME segment 19 


90 nts 


SEQIDNO: 1360 


Polypeptide encoded by SEQ ID NO: 1359 


30 aa 


SEQIDNO: 1361 


PRAME segment 20 


90 nts 


SEQIDNO: 1362 


Polypeptide encoded by SEQ ID NO: 1361 


30 aa 


SEQIDNO: 1363 


PRAME segment 21 


90 nts 


SEQIDNO: 1364 


Polypeptide encoded by SEQ ID NO: 1363 


30 aa 


SEQIDNO: 1365 


PRAME segment 22 


90 nts 


SEQIDNO: 1366 j 


Polypeptide encoded by SEQ ID NO: 1365 


30 aa 



WO 01/090197 



PCT/AU01/00622 



-74- 



SEQ ID NO: 1367 


PRAME segment 23 


90 nts 


SEQIDNO: 1368 


Polypeptide encoded by SEQ ID NO: 1367 


30 aa 


SEQ ID NO: 1369 


PRAME segment 24 


90 nts 


SEQ ID NO: 1370 


Polypeptide encoded by SEQ ID NO: 1369 


30 aa 


SEQIDNO: 1371 


PRAME segment 25 


90 nts 


SEQIDNO: 1372 


Polypeptide encoded by SEQ ID NO: 1371 


30 aa 


SEQ ED NO: 1373 


PRAME segment 26 


90 nts 


SEQIDNO: 1374 


Polypeptide encoded by SEQ ID NO: 1373 


30 aa 


SEQ ID NO: 1375 


PRAME segment 27 


90 nts 


SEQIDNO: 1376 


Polypeptide encoded by SEQ ID NO: 1375 


30 aa 


SEQIDNO: 1377 


PRAME segment 28 


90 nts 


SEQIDNO: 1378 


Polypeptide encoded by SEQ ID NO: 1377 


30 aa 


SEQ ID NO: 1379 


PRAME segment 29 


90 nts 


SEQ ID NO: 1380 


Polypeptide encoded by SEQ ID NO: 1379 


30 aa 


SEQ ID NO: 1381 


PRAME segment 30 


90 nts 


SEQIDNO: 1382 


PolypepUde encoded by SEQ ID NO: 1381 


30 aa 


SEQIDNO: 1383 


PRAME segment 3 1 


90 nts | 


SEQIDNO: 1384 


Polypeptide encoded by SEQ ID NO: 1383 


30 aa 


bisQ ID NU: lioj 


"OT5 AX/17 iiii run n^f lO 

rKAMb segment Jz 


90 nts 


SEQIDNO: 1386 


Polypeptide encoded by SEQ ID NO: 1385 


30 aa 


SEQIDNO: 1387 


PRAME segment 33 


90 nts 


SEQ ID NO: 1388 


Polypeptide encoded by SEQ ID NO: 1387 


30 aa 


SEQIDNO: 1389 


PRAME segment 34 


54 nts 


SEQIDNO: 1390 


Polypeptide encoded by SEQ ID NO: 1389 


18 aa 
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QPH TTI MO* 1 101 


TPP9TM9 cpompnt 1 


OA ntc 


OT7A ttn VF/V 1 TOO 

abQ id nu. I3yz 


roiypepuae encoaea oy otx^ id nu. ijv i 


30 aa 


OTJA TA XTA. 1 TQ1 

bJbQ id nu: 1393 


iKrziiv/ segment z 


A A 

90 nts 


OCA TT"\ VTA. 1 1C\ A 

SEQ ID NU: 1394 


roiypepuae encoaea by bfcQ ID NU. 1393 


30 aa 


OT? A TT\ >TA. f one 

SEQIDNO: 1395 


nm *oo ivti ..M i i n .-i* "5 

IKrzlNz segment 3 


84 nts 


SEQIDNO: 1396 


T"k_ 1_ A ; _ . J « | nT?A TTX VTA- 1 *> A 

Polypeptide encoaea by SEQ ID NO: 1395 


28 aa 


SEQIDNO: 1397 


\rtrvT^ ai _ a -i 

NYNSOla segment 1 


90 nts 


SEQIDNO: 1398 


TV 1 J 1 11 npA TTV \TA 1 OA"l 

Polypeptide encoded by SEQ ID NO: 1397 


30 aa 


SEQIDNO: 1399 


NYNSOla segment 2 


90 nts 


SEQ ID NO: 1400 


Polypeptide encoded by SEQ ID NO: 1399 


30 aa 


OPA TA VTA. 1 A A1 

SEQ ID NO: 1401 


NYNSOla segment 3 


90 nts 


Op/-i TTN VTA. 1 i 

SEQ ID NO: 1402 


TV — I- _ _ _ _ J J 1 _ - flT?A TT* VTA. 1 i/M 

Polypeptide encoded by SEQ ID NO: 1401 


30 aa 


CCA TPV VTA, 1 iA-j 

ofcQ ID NU: 1403 


XTVXTCAI r% >l 

NYNoU la segment 4 


AA .4. 

90 nts 


OCA rp\ VTA. 1 Af\A 

bEQ ID NU: 1404 


Dnlimnnh'/ta AnAA/^A^ 1.-- CCA TA XT A. 1 il AO 

roiypepuae encoaea by MiQ ID NU: 1403 


30 aa 


or? a m vta. | yf ac 
OJ^V *U «U. 140D 


XTVXICA1 *» corrmont < 

iNxJNouia segment d 


vo nts 


OI7A IT* XJA- 1 i4A/C 

1U NU. 14Uo 


roiypepuae encoaea oy oxsy id nu. 140j 


30 aa 




in i r>iov/ia segment o 


QH rite 

nis 


QUA TT\ XJA- 1 AA9 

oxiv^ ID NU. IWO 


DAltmpnti/lo t*nf*nAt>A \\\r CPA ITl XTA- 1 /I A^7 

roiypepuae encoaea oy oc\^ ixj inu. i*rU / 


ju aa 


AA-/ l^CV- . I*tv7 




00 ntQ 


SEQIDNO: 1410 


Polypeptide encoded by SEQ ID NO: 1409 


30 aa 


SEQIDNO: 1411 


NYNSOla segment 8 


90 nts 


SEQIDNO: 1412 


Polypeptide encoded by SEQ ID NO: 141 1 


30 aa 


SEQIDNO: 1413 


NYNSOla segment 9 


90 nts 


SEQIDNO: 1414 


Polypeptide encoded by SEQ ID NO: 1413 


30 aa 



■4 
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t?1 v: 




SEQIDNO: 1415 


NYNSOl a segment 10 


90nts 


SEQ ID NO: 1416 


Polypeptide encoded by SEQ ED NO: 1415 


30 aa 


SEQIDNO: 1417 


NYNSOl a segment 11 


90nts 


SEQIDNO: 1418 


Polypeptide encoded by SEQ ID NO: 1417 


30 aa 


SEQIDNO: 1419 


NYNSOl a segment 12 


57 nts 


SEQIDNO: 1420 


Polypeptide encoded by SEQ ID NO: 1419 


19 aa 


SEQIDNO: 1421 


NYNSOlb segment 1 


90 nts 


SEQIDNO: 1422 


Polypeptide encoded by SEQ ID NO: 1421 


30 aa 


SEQIDNO: 1423 


NYNSOlb segment 2 


90 nts 


SEQIDNO: 1424 


Polypeptide encoded by SEQ ID NO: 1423 


30 aa 


SEQIDNO: 1425 


NYNSOlb segment 3 


90 nts 


SEQIDNO: 1426 


Polypeptide encoded by SEQ ID NO: 1425 


30 aa 


SEQIDNO: 1427 


NYNSOlb segment 4 


51 nts 


SEQIDNO: 1428 


Polypeptide encoded by SEQ ID NO: 1427 




SEQIDNO: 1429 


LAGE1 segment 1 


90 nts 


SEQ ID NO: 1430 


Polypeptide encoded by SEQ ID NO: 1429 


30 aa 


SEQIDNO: 1431 


LAGE1 segment 2 


90 nts 


SEQIDNO: 1432 


Polypeptide encoded by SEQ ID NO: 1431 


30 aa 


SEQ ID NO: 1433 


LAGE1 segment 3 


90 nts 


SEQIDNO: 1434 


Polypeptide encoded by SEQ ID NO: 1433 


30 aa 


SEQIDNO: 1435 


LAGE1 segment 4 


90 nts 


SEQ ID NO: 1436 


Polypeptide encoded by SEQ ID NO: 1435 


30 aa 


SEQIDNO: 1437 


LAGE1 segment 5 


90 nts 


SEQIDNO: 1438 


Polypeptide encoded by SEQ ID NO: 1437 


30 aa 
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SEQ ID NO: 1439 


LAGEl segment 6 


90nts 


SEQIDNO: 1440 


Polypeptide encoded by SEQ ID NO: 1439 


30 aa 


SEQ ID NO: 1441 


LAGEl segment 7 


90 nts 


SEQIDNO: 1442 


Polypeptide encoded by SEQ ID NO: 1441 


30 aa 


SEQ ID NO: 1443 


LAGEl segment 8 


90 nts 


SEQIDNO: 1444 


Polypeptide encoded by SEQ ED NO: 1443 


30 aa 


SEQIDNO: 1445 


LAGEl segment 9 


90 nts 


SEQ ID NO: 1446 


Polypeptide encoded by SEQ ID NO: 1445 


30 aa 


SEQ ID NO: 1447 


LAGEl segment 10 


90 nts 


SEQIDNO: 1448 


Polypeptide encoded by SEQ ID NO: 1447 


30 aa 


SEQIDNO: 1449 


LAGEl segment 11 


90 nts 


SEQ ID NO: 1450 


Polypeptide encoded by SEQ ID NO: 1449 


30 aa 


SEQIDNO: 1451 


LAGEl segment 12 


57 nts 


SEQ ID NO: 1452 


Polypeptide encoded by SEQ ID NO: 1451 


19 aa 


SEQIDNO: 1453 


Melanoma cancer specific Savine 


10623 nts 


SEQ ID NO: 1454 


Polypeptide encoded by SEQ ID NO: 1453 


3541 aa 


SEQIDNO: 1455 


Figure 16 A1S1 99mer 


99 nts 


SEQ ID NO: 1456 


Figure 16 A1S2 lOOmer 


100 nts 


SEQ ID NO: 1457 


Figure 16 A1S3 lOOmer 


100 nts 


SEQIDNO: 1458 


Figure 16 AlS4100mer 


100 nts 


SEQIDNO: 1459 


Figure 16 A1S5 lOOmer 


100 nts 


SEQIDNO: 1460 


Figure 16 A1S6 99mer 


99 nts 


SEQ ID NO: 1461 


Figure 16A1S7 97mer 


99 nts i 


SEQIDNO: 1462 


Figure 16A1S8 lOOmer 


100 nts 
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SEQ ID NO: 1463 


Figure 16 A1S9 lOOmer 


lOOnts 


SEQ ID NO: 1464 


Figure 16 AlS10 75mer 


76nts 


SEQIDNO: 1465 


Figure 16 AlF20mer 


20nts 


SEQ ID NO: 1466 


Fisure 16AlR20mer 


20nts 


SEQIDNO: 1467 


Amino acid sequence of immunostimulatory 
domain of an invasin protein from Yersinia spp. 


16 aa 
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DET AILED DESCRIPTION OF THE INVENTION 
L Definitions 

The articles "a " and "an " are used herein to refer to one or to more than one (i.e., 
to at least one) of the grammatical object of the article. By way of example, "an element" 
5 means one element or more than one element. 

As used herein, the term "about" refers to a quantity, level, value, dimension, 
size, or amount that varies by as much as 30%, preferably by as much as 20%, and more 
preferably by as much as 10% to a reference quantity, level, value, dimension, size, or 
amount. 

10 By "antigen-binding molecule " is meant a molecule that has binding affinity for a 

target antigen. It will be understood that this term extends to immunoglobulins, 
immunoglobulin fragments and non-immunoglobulin derived protein frameworks that 
exhibit antigen-binding activity. 

The term "clade" as used herein refers to a hypothetical species of an organism 
15 and its descendants or a monophyletic group of organisms. Clades cany a definition, based 
on ancestry, and a diagnosis, based on synapomoiphies. It should be noted that diagnoses 
of clades could change while definitions do not 

Throughout this specification, unless the context requires otherwise, the words 
"comprise", "comprises" and "comprising" will be understood to imply the inclusion of a 
20 stated step or element or group of steps or elements but not the exclusion of any other step 
or element or group of steps or elements. 

By "expression vector" is meant any autonomous genetic element capable of 
directing the synthesis of a protein encoded by the vector. Such expression vectors are 
known by practitioners in the art. 

25 As used herein, the term "function" refers to a biological, enzymatic, or 

therapeutic function. 



WO 01/090197 



PCT/AU01/00622 



-80- 

" Homology" refers to the percentage number of amino acids that are identical or 
constitute conservative substitutions as defined in Table B infra. Homology may be 
determined using sequence comparison programs such as GAP (Deveraux et ah 1984, 
Nucleic Acids Research 12, 387-395). In this way, sequences of a similar or substantially 
S different length to those cited herein might be compared by insertion of gaps into the 
alignment, such gaps being determined, for example, by the comparison algorithm used by 
GAP. 

To enhance an immune response ( "immunoenhancement "), as is well-known in 
the art, means to increase an animal's capacity to respond to foreign or disease-specific 

10 antigens (e.g., cancer antigens) i.e., those cells primed to attack such antigens are increased 
in number, activity, and ability to detect and destroy the those antigens. Strength of 
immune response is measured by standard tests including: direct measurement of 
peripheral blood lymphocytes by means known to the art; natural killer cell cytotoxicity 
assays (see, e.g.. Provincial M. et al (1992, J. Immunol. Meth. 155: 19-24), cell 

15 proliferation assays (see, e.g., Vollenweider, I. and Groseurth, P. J. (1992, J. Immunol. 
Meth. 149: 133-135), immunoassays of immune cells and subsets (see, eg, Loeffler, D. 
A., et al. (1992, Cytom. 13: 169-174); Rivoltini, L., et al. (1992, Can. Immunol. 
Immunother. 34: 241-251); or skin tests for cell-mediated immunity (see, e.g., Chang, A. 
E. et al (1993, Cancer Res. 53: 1043-1050). Any statistically significant increase in 

20 strength of immune response as measured by the foregoing tests is considered "enhanced 
immune response" "immunoenhancement" or "immunopotentiation" as used herein. 
Enhanced immune response is also indicated by physical manifestations such as fever and 
inflammation, as well as healing of systemic and local infections, and reduction of 
symptoms in disease, i.e., decrease in tumour size, alleviation of symptoms of a disease or 

25 condition including, but not restricted to, leprosy, tuberculosis, malaria, naphthous ulcers, 
herpetic and papillomatous warts, gingivitis, atherosclerosis, the concomitants of AIDS 
such as Kaposi's sarcoma, bronchial infections, and the like. Such physical manifestations 
also define "enhanced immune response" "immunoenhancement" or 
"immunopotentiation " as used herein. 

30 Reference herein to "immuno-inter active" includes reference to any interact] n, 

reaction, or other form of association between molecules and in particular where one of the 
molecules is, or mimics, a component of the immune system. 
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By "isolated" is meant material that is substantially or essentially free from 
components that normally accompany it in its native state. 

By "modulating" is meant increasing or decreasing, either directly or indirectly, 
an immune response against a target antigen of a member selected from the group 
5 consisting of a cancer and an organism, preferably a pathogenic organism. 

By "natural gene" is meant a ger^ that naturally encodes a protein. 

The term "natural polypeptide" as used herein refers to a polypeptide that exists 
in nature. 

By "obtained from " is meant that a sample such as, for example, a polynucleotide 
10 extract or polypeptide extract is isolated from, or derived from, a particular source of the 
host. For example, the extract can be obtained from a tissue or a biological fluid isolated 
directly from the host 

The term "oligonucleotide" as used herein refers to a polymer composed of a 
multiplicity of nucleotide residues (deoxyribonucleotides or ribonucleotides, or related 

IS structural variants or synthetic analogues thereof) linked via phosphodiester bonds (or 
related structural variants or synthetic analogues thereof). Thus, while the term 
"oligonucleotide" typically refers to a nucleotide polymer in which the nucleotide residues 
and linkages between them are naturally occurring, it will be understood that the term also 
includes within its scope various analogues including, but not restricted to, peptide nucleic 

20 acids (PNAs), phosphoramidates, phosphorothioates, methyl phosphonates, 2-O-methyl 
ribonucleic acids, and the like. The exact size of the molecule can vary depending on the 
particular application. An oligonucleotide is typically rather short in length, generally from 
about 10 to 30 nucleotide residues, but the term can refer to molecules of any length, 
although the term "polynucleotide" or •'nucleic acid" is typically used for large 

25 oligonucleotides. 

By "operably linked" is meant that transcriptional and translational regulatory 
polynucleotides are positioned relative to a polypeptide-encoding polynucleotide in such a 
manner that the polynucleotide is transcribed and the polypeptide is translated. 
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The term "parent polypeptide" as used herein typically refers to a polypeptide 
encoded by a natural gene. However, it is possible that the parent polypeptide corresponds 
to a protein that is not naturally-occurring but has been engineered using recombinant 
techniques. In this instance, a polynucleotide encoding the parent polypeptide may 
5 comprise different but synonymous codons relative to a natural gene encoding the same 
polypeptide. Alternatively, the parent polypeptide may not correspond to a natural 
polypeptide sequence. For example, the parent polypeptide may comprise one or more 
consensus sequences common to a plurality of polypeptides. 

The term "patient 9 refers to patients of human or other mammal and includes any 
10 individual it is desired to examine or treat using the methods of the invention. However, it 
will be understood that "patient 9 does not imply that symptoms are present. Suitable 
mammals that fall within the scope of the invention include, but are not restricted to, 
primates, livestock animals (e.g. 9 sheep, cows, horses, donkeys, pigs), laboratory test 
animals (e.g y rabbits, mice, rats, guinea pigs, hamsters), companion animals (e.g., cats, 
IS dogs) and captive wild animals foxes, deer, dingoes). 

By "pharmaceutically-acceptable carrier" is meant a solid or liquid filler, diluent 
or encapsulating substance that can be safely used in topical or systemic administration to a 
mammal. 

"Polypeptide", "peptide" and "protein* 9 are used interchangeably herein to refer to 
20 a polymer of amino acid residues and to variants and synthetic analogues of the same. 
Thus, these terms apply to amino acid polymers in which one or more amino acid residues 
is a synthetic non-naturally occurring amino acid, such as a chemical analogue of a 
corresponding naturally occurring amino acid, as well as to naturally-occurring amino acid 
polymers. 

25 The term "polynucleotide 9 or "nucleic acid 9 as used herein designates mRNA, 

RNA, cRNA, cDNA or DNA. The term typically refers to oligonucleotides greater than 30 
nucleotide residues in length. 

By "prime? 9 is meant an oligonucleotide which, when paired with a strand of 
DNA, is capable of initiating the synthesis of a prima- extension product in the presence of 
30 a suitable polymerising agent. The primer is preferably single-stranded for maximum 
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efficiency in amplification but can alternatively be double-stranded. A primer must be 
sufficiently long to prime the synthesis of extension products in the presence of the 
polymerisation agent. The length of the primer depends on many factors, including 
application, temperature to be employed, template reaction conditions, other reagents, and 
5 source of primers. For example, depending on the complexity of the target sequence, the 
oligonucleotide primer typically contains 15 to 35 or more nucleotide residues, although it 
can contain fewer nucleotide residues. Primers can be large polynucleotides, such as from 
about 35 nucleotides to several kilobases or more. Primers can be selected to be 
"substantially complementary* to the sequence on the template to which it is designed to 

10 hybridise and serve as a site for the initiation of synthesis. By "substantially 
complementary", it is meant that the primer is sufficiently complementary to hybridise 
with a target polynucleotide. Preferably, the primer contains no mismatches with the 
template to which it is designed to hybridise but this is not essential. For example, non- 
complementary nucleotide residues can be attached to the 5* end of the primer, with the 

15 remainder of the primer sequence being complementary to the template. Alternatively, 
non-complementary nucleotide residues or a stretch of non-complementary nucleotide 
residues can be interspersed into a primer, provided that the primer sequence has sufficient 
complementarity with the sequence of the template to hybridise therewith and thereby form 
a template for synthesis of the extension product of the primer. 

20 "Probe" refers to a molecule that binds to a specific sequence or sub-sequence or 

other moiety of another molecule. Unless otherwise indicated, the term "probe" typically 
refers to a polynucleotide probe that binds to another polynucleotide, often called the 
"target polynucleotide", through complementary base pairing. Probes can bind target 
polynucleotides lacking complete sequence complementarity with the probe, depending on 

25 the stringency of the hybridisation conditions. Probes can be labelled directly or indirectly. 

By "recombinant polypeptide" is meant a polypeptide made using recombinant 
techniques, i.e, through the expression of a recombinant or synthetic polynucleotide. 

Terms used to describe sequence relationships between two or more 
polynucleotides or polypeptides include "reference sequence**, "comparison window*', 
30 "sequence identity**, '^percentage of sequence identity** and "substantial identity". A 
"reference sequence" is at least 12 but frequently 15 to 18 and often at least 25 monomer 
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units, inclusive of nucleotides and amino acid residues, in length. Because two 
polynucleotides may each comprise (1) a sequence (i.e., only a portion of the complete 
polynucleotide sequence) that is similar between the two polynucleotides, and (2) a 
sequence that is divergent between the two polynucleotides, sequence comparisons 
5 between two (or more) polynucleotides are typically performed by comparing sequences of 
the two polynucleotides over a "comparison window" to identify and compare local 
regions of sequence similarity. A "comparison window* 9 refers to a conceptual segment of 
at least 50 contiguous positions, usually about 50 to about 100, more usually about 100 to 
about 150 in which a sequence is compared to a reference sequence of the same number of 

10 contiguous positions after the two sequences are optimally aligned. The comparison 
window may comprise additions or deletions (ie f gaps) of about 20% or less as compared 
to the reference sequence (which does not comprise additions or deletions) for optimal 
alignment of the two sequences. Optimal alignment of sequences for aligning a comparison 
window may be conducted by computerised implementations of algorithms (GAP, 

15 BESTTTT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Release 
7.0, Gaieties Computer Group, 575 Science Drive Madison, WI, USA) or by inspection 
and the best alignment (i.e., resulting in the highest percentage homology over the 
comparison window) generated by any of the various methods selected. Reference also 
may be made to the BLAST family of programs as for example disclosed by Altschul et 

20 aL y 1997, NucL Acids Res. 25:3389. A detailed discussion of sequence analysis can be 
found in Unit 19.3 of Ausubel et al. y "Current Protocols in Molecular Biology", John 
Wiley & Sons Inc, 1994-1998, Chapter 15. 

The term "sequence identity" as used herein refers to the extent that sequences 
are identical on a nucleotide-by-nucleotide basis or an amino acid-by-amino acid basis 

25 over a window of comparison. Thus, a "percentage of sequence identity" is calculated by 
comparing two optimally aligned sequences over the window of comparison, determining 
the number of positions at which the identical nucleic acid base (e.g., A, T, C, G, I) or the 
identical amino acid residue (e.g., Ala, Pro, Ser, Thr, Gly, Val, Leu, lie, Phe, Tyr, Trp, Lys, 
Arg, His, Asp, Glu, Asn, Gin, Cys and Met) occurs in both sequences to yield the number 

30 of matched positions, dividing the number of matched positions by the total number of 
positions in the window of comparison (i.e., the window size), and multiplying the result 
by 100 to yield the percentage of sequence identity. For the purposes of the present 
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invention, "sequence identity" will be understood to mean the "match percentage" 
calculated by the DNASIS computer program (Version 2.5 for windows; available from 
Hitachi Software engineering Co., Ltd., South San Francisco, California, USA) using 
standard defaults as used in the reference manual accompanying the software. 

5 The torn "synthetic polynucleotide" as used herein refers to a polynucleotide 

formed in vitro by the manipulation of a polynucleotide into a form not normally found in 
nature. For example, the synthetic polynucleotide can be in the form of an expression 
vector. Generally, such expression vectors include transcriptional and translationa] 
regulatory polynucleotide operably linked to the polynucleotide. 

10 The term "synonymous codon " as used herein refers to a codon having a different 

nucleotide sequence than another codon but encoding the same amino acid as that other 
codon. 

By "translational efficiency" is meant the efficiency of a cell's protein synthesis 
machinery to incorporate the amino acid encoded by a codon into a nascent polypeptide 
IS chain. This efficiency can be evidenced, for example, by the rate at which the cell is able to 
synthesise the polypeptide from an RNA template comprising the codon, or by the amount 
of the polypeptide synthesised from such a template. 

By "vector" is meant a polynucleotide molecule, preferably a DNA molecule 
derived, for example, from a plasmid, bacteriophage, yeast or virus, into which a 

20 polynucleotide can be inserted or cloned. A vector preferably contains one or more unique 
restriction sites and can be capable of autonomous replication in a defined host cell 
including a target cell or tissue or a progenitor cell or tissue thereof, or be integrable with 
the genome of the defined host such that the cloned sequence is reproducible. Accordingly, 
the vector can be an autonomously replicating vector, i.e., a vector that exists as an 

25 extrachromosomal entity, the replication of which is independent of chromosomal 
replication, e.g., a linear or closed circular plasmid, an extrachromosomal element, a 
minichromosome, or an artificial chromosome. The vector can contain any means for 
assuring self-replication. Alternatively, the vector can be one which, when introduced into 
the host cell, is integrated into the genome and replicated together with the chromosome(s) 

30 into which it has been integrated. A vector system can comprise a single vector or plasmid, 
two or more vectors or plasmids, which together contain the total DNA to be introduced 
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into the genome of the host cell, or a transposon. The choice of the vector will typically 
depend on the compatibility of the vector with the host cell into which the vector is to be 
introduced. In the present case, the vector is preferably a viral or viral-derived vector, 
which is operably functional in animal and preferably mammalian cells. Such vector may 
5 be derived from a poxvirus, an adenovirus or yeast. The vector can also include a selection 
marker such as an antibiotic resistance gene that can be used for selection of suitable 
transformants. Examples of such resistance genes are known to those of skill in the art and 
include the nptU gene that confers resistance to the antibiotics kanamycin and G418 
(Geneticin®) and the hph gene which confers resistance to the antibiotic hygromycin B. 
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2. Synthetic polypeptides 

The inventors have surprisingly discovered that the structure of a parent 
polypeptide can be disrupted sufficiently to impede, abrogate or otherwise alter at least one 
function of the parent polypeptide, while simultaneously minimising the destruction of 
5 potentially useful epitopes that are present in the parent polypeptide, by fusing, coupling or 
otherwise linking together different segments of the parent polypeptide in a different 
relationship relative to their linkage in the parent polypeptide. As a result of this change in 
relationship, the sequence of the linked segments in the resulting synthetic polypeptide is 
different to a sequence contained within the parent polypeptide. The synthetic polypeptides 
10 of the invention are useful as immunopotentiating agents, and are referred to elsewhere in 
the specification as scrambled antigen vaccines, super attenuated vaccines or "Savines". 

Thus, the invention broadly resides in a synthetic polypeptide comprising a 
plurality of different segments of at least one parent polypeptide, wherein said segments 
are linked together in a different relationship relative to their linkage in the at least one 
1 5 parent polypeptide. 

It is preferable but not essential that the segments in said synthetic polypeptide are 
linked sequentially in a different order or arrangement relative to that of corresponding 
segments in said at least one parent polypeptide. For example, in the case of a parent 
polypeptide that comprises three contiguous or overlapping segments A-B-C-D, these 

20 segments may be linked in 23 other possible orders to form a synthetic polypeptide. These 
orders may be selected from the group consisting of: A-B-D-C, A-C-B-D, A-C-D-B, A-D- 
B-C, A-D-C-B, B-A-C-D, B-A-D-C, B-C-A-D, B-C-D-A, B-D-A-C, B-D-C-A, C-A-B-D, 
C-A-D-B, C-B-A-D, C-B-D-A, C-D-A-B, C-D-B-A, D-A-B-C, D-A-C-B, D-B-A-C, D-B- 
C-A, D-C-A-B, and D-C-B-A. Although the rearrangement of the segments is preferably 

25 random, it is especially preferable to exclude or otherwise minimise rearrangements that 
result in complete or partial reassembly of the parent sequence ADBC, BACD, 
DABC). It will be appreciated, however, that the probability of such complete or partial 
reassembly diminishes as the number of segments for rearrangement increases. 

The order of the segments is suitably shuffled, reordered or otherwise rearranged 
30 relative to the order in which they exist in the parent polypeptide so that the structure of the 
polypeptide is disrupted sufficiently to impede, abrogate or otherwise alter at least one 
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function associated with the parent polypeptide. Preferably, the segments of the parent 
polypeptide are randomly rearranged in the synthetic polypeptide. 

The parent polypeptide is suitably a polypeptide that is associated with a disease 
or condition. For example, the parent polypeptide may be a polypeptide expressed by a 
5 pathogenic organism or a cancer. Alternatively, the parent polypeptide can be a self 
peptide related to an autoimmune disease including, but are not limited to, diseases such as 
diabetes (e.g., juvenile diabetes), multiple sclerosis, rheumatoid arthritis, myasthenia 
gravis, atopic dermatitis, and psoriasis and ankylosing spondylitis. Accordingly, the 
synthetic molecules of the present invention may also have utility for the induction of 

10 tolerance in a subject afflicted with an autoimmune disease or condition or with an allergy 
or other condition to which tolerance is desired. For example tolerance may be induced by 
contacting an immature dendritic cell of the individual to be treated with a synthetic 
polypeptide of the invention or by expressing in an immature dendritic cell a synthetic 
polynucleotide of the invention. Tolerance may also be induced against antigens causing 

IS allergic responses (e.g., asthma, hay fever). In this case, the parent polypeptide is suitably 
an allergenic protein including, but not restricted to, house-dust-mite allergenic proteins as 
for example described by Thomas and Smith (1998, Allergy, 53(9): 821-832). 

The pathogenic organism includes, but is not restricted to, yeast, a virus, a 
bacterium, and a parasite. Any natural host of the pathogenic organism is contemplated by 

20 the present invention and includes, but is not limited to, mammals, avians and fish. In a 
preferred embodiment, the pathogenic organism is a virus, which may be an RNA virus or 
a DNA virus. Preferably, the RNA virus is Human Immunodeficiency Virus (HIV), 
Poliovirus, and Influenza virus, Rous sarcoma virus, or a Flavi virus such as Japanese 
encephalitis virus. In a preferred embodiment, the RNA virus is a Hepatitis virus including, 

25 but not limited to, Hepatitis strains A, B and C. Suitably, the DNA virus is a Herpesvirus 
including, but not limited to, Herpes simplex virus, Epstein-Barr virus, Cytomegalovirus 
and Parvovirus. In a preferred embodiment, the virus is HIV and the parent polypeptide is 
suitably selected from env, gag, pol, vif, vpr, tat, rev, vpu and nef, or combination thereof. 
In an alternate preferred embodiment, the virus is Hepatitis CI a virus and the parent 

30 polypeptide is the Hepatitis CI a virus polyprotein. 
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In another embodiment, the pathogenic organism is a bacterium, which includes, 
but is not restricted to, Neisseria species, Meningococcal species, Haemophilus species 
Salmonella species, Streptococcal species, Legionella species and Mycobacterium species. 

In yet another embodiment, the pathogenic organism is a parasite, which includes, 
5 but is not restricted to, Plasmodium species, Schistosoma species, Leishmania species, 
Trypanosoma species, Toxoplasma species and Giardia species. 

Any cancer or tumour is contemplated by the present invention. For example, the 
cancer or tumour includes, but is not restricted to, melanoma, lung cancer, breast cancer, 
cervical cancer, prostate cancer, colon cancer, pancreatic cancer, stomach cancer, bladder 

0 cancer, kidney cancer, post transplant lymphoproliferative disease (PTLD), Hodgkin's 
Lymphoma and the like. Preferably, the cancer or tumour relates to melanoma. In a 
preferred embodiment of this type, the parent polypeptide is a melanocyte differentiation 
antigen which is suitably selected from gplOO, MART, TRP-1, Tyros, TRP2, MC1R, 
MUC1F, MUC1R or a combination thereof. In an alternate preferred embodiment of this 

S type, the parent polypeptide is a melanoma-specific antigen which is suitably selected from 
BAGE, GAGE-1, gpl00In4, MAGE-1, MAGE-3, PRAME, TRP2IN2, NYNSOla, 
NYNSOlb, LAGE1 or a combination thereof. 

In a preferred embodiment, the segments are selected on the basis of size. A 
segment according to the invention may be of any suitable size that can be utilised to elicit 

3 an immune response against an antigen encoded by the parent polypeptide. A number of 
factors can influence the choice of segment size. For example, the size of a segment should 
be preferably chosen such that it includes, or corresponds to the size of, T cell epitopes and 
their processing requirement Practitioners in the art will recognise that class I-restricted T 
cell epitopes can be between 8 and 10 amino acids in length and if placed next to unnatural 

5 flanking residues, such epitopes can generally require 2 to 3 natural flanking amino acids 
to ensure that they are efficiently processed and presented. Class II-restricted T cell 
epitopes can range between 12 and 25 amino acids in length and may not require natural 
flanking residues for efficient proteolytic processing although it is believed that natural 
flanking residues may play a role. Another important feature of class II-restricted epitopes 

3 is that they generally contain a core of 9-10 amino acids in the middle which bind 
specifically to class II MHC molecules with flanking sequences either side of this core 
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stabilising binding by associating with conserved structures on either side of class II MHC 
antigens in a sequence independent manner (Brown et aL, 1993). Thus the functional 
region of class H-restricted epitopes is typically less than 15 amino acids long. The size of 
linear B cell epitopes and the factors effecting their processing, like class H-restricted 
5 epitopes, are quite variable although such epitopes are frequently smaller in size than IS 
amino acids. From the foregoing, it is preferable, but not essential, that the size of the 
segment is at least 4 amino acids, preferably at least 7 amino acids, tnore preferably at least 
12 amino acids, more preferably at least 20 amino acids and more preferably at least 30 
amino acids. Suitably, the size of the segment is less than 2000 amino acids, more 

10 preferably less than 1000 amino acids, more preferably less than 500 amino acids, more 
preferably less than 200 amino acids, more preferably less than 100 amino acids, more 
preferably less than 80 amino acids and even more preferably less than 60 amino acids and 
still even more preferably less than 40 amino acids. In this regard, it is preferable that the 
size of the segments is as small as possible so that the synthetic polypeptide adopts a 

15 functionally different structure relative to the structure of the parent polypeptide. It is also 
preferable that the size of the segments is large enough to minimise loss of T cell epitopes. 
In an especially preferred embodiment, the size of the segment is about 30 amino acids. 

An optional spacer may be utilised to space adjacent segments relative to each 
other. Accordingly, an optional spacer may be interposed between some or all of the 

20 segments. The spacer suitably alters proteolytic processing and/or presentation of adjacent 
segments). In a preferred embodiment of this type, the spacer promotes or otherwise 
enhances proteolytic processing and/or presentation of adjacent segment(s). Preferably, the 
spacer comprises at least one amino acid. The at least one amino acid is suitably a neutral 
amino acid. The neutral amino acid is preferably alanine. Alternatively, the at least one 

25 amino acid is cysteine. 

In a preferred embodiment, segments are selected such that they have partial 
sequence identity or homology with one or more other segments. Suitably, at one or both 
ends of a respective segment there is comprised at least 4 contiguous amino acids, 
preferably at least 7 contiguous amino acids, more preferably at least 10 contiguous amino 
30 acids, more preferably at least 15 contiguous amino acids and even more preferably at least 
20 contiguous amino acids that are identical to, or homologous with, an amino acid 
sequence contained within one or more other of said segments. Preferably, at the or each 
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end of a respective segment there is comprised less than 500 contiguous amino acids, more 
preferably less than 200 contiguous amino acids, more preferably less than 100 contiguous 
amino acids, more preferably less than SO contiguous amino acids, more preferably less 
than 40 contiguous amino acids, and even more preferably less than 30 contiguous amino 
S acids that are identical to, or homologous with, an amino acid sequence contained within 
one or more other of said segments. Such sequence overlap (also referred to elsewhere in 
the specification as "overlapping fragments 9 ' or "overlapping segments") is preferable to 
ensure potential epitopes at segment boundaries are not lost and to ensure that epitopes at 
or near segment boundaries are processed efficiently if placed beside or near amino acids 
10 that inhibit processing. Preferably, the segment size is about twice the size of the overlap. 

In a preferred embodiment, when segments have partial sequence homology 
therebetween, the homologous sequences suitably comprise conserved and/or non- 
conserved amino acid differences. Exemplary conservative substitutions are listed in the 
following table. 

15 TABLE B 







Ala 


Ser 


Arg 


Lys 


Asn 


Gin, His 


Asp 


Glu 


Cys 


Ser 


Gin 


Asn 


Glu 


Asp 


Gly 


Pro 


His 


Asn, Gin 


De 


Leu,Val 


Leu 


Ile,Val 
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Lys Arg, Gin, Glu 



Met 



Leu, De, 



Phe 



Met, Leu, Tyr 



Ser 



Thr 



Thr 



Ser 



Tip 



Tyr 



Tyr 



Trp, Phe 



Val 



De, Leu 



Conserved or non-conserved differences may correspond to polymorphisms in 
corresponding parent polypeptides. Polymorphic polypeptides are expressed by various 



5 expressed by different viral strains or clades or by cancers in different individuals. 

Sequence overlap between respective segments is preferable to minimise 
destruction of any epitope sequences that may result from any shuffling or rearrangement 
of the segments relative to their existing order in the parent polypeptide. If overlapping 
segments as described above are employed to form a synthetic polypeptide, it may not be 

10 necessary to change the order in which those segments are linked together relative to the 
order in which corresponding segments are normally present in the parent polypeptide. In 
this regard, such overlapping segments when linked together in the synthetic polypeptide 
can adopt a different structure relative to the structure of the parent polypeptide, wherein 
the different structure does not provide for one or more functions associated with the 

15 parent polypeptide. For example, in the case of four segments A-B-C-D each spanning 30 
contiguous amino acids of the parent polypeptide and having a 10-amino acid overlapping 
sequence with one or more adjacent segments, the synthetic polypeptide will have 
duplicated 10-amino acid sequences bridging segments A-B, B-C and C-D. The presence 
of these duplicated sequences may be sufficient to render a different structure and to 

20 abrogate or alter function relative to the parent polypeptide. 



pathogenic organisms and cancers. For example, the polymorphic polypeptides may be 
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In a preferred embodiment, segment size is about 30 amino acids and sequence 
overlap at one or both ends of a respective segment is about IS amino acids. However, it 
will be understood that other suitable segment sizes and sequence overlap sizes are 
contemplated by the present invention, which can be readily ascertained by persons of skill 
5 in the art 

It is preferable but not necessary to utilise all the segments of the parent 
polypeptide in the construction of the synthetic polypeptide. Suitably, at least 30%, 
preferably at least 40%, more preferably at least 50%, even more preferably at least 60%, 
even more preferably at least 70%, even more preferably at least 80% and still even more 

10 preferably at least 90% of the parent polypeptide sequence is used in the construction of 
the synthetic polypeptide. However, it will be understood that the more sequence 
information from a parent polypeptide that is utilised to construct the synthetic 
polypeptide, the greater the population coverage will be of the synthetic polypeptide as an 
immunogen. Preferably, no sequence information from the parent polypeptide is excluded 

IS (eg., because of an apparent lack of immunological epitopes). 

Persons of skill in the ait will appreciate that when preparing a synthetic 
polypeptide against a pathogenic organism (e.g., a virus) or a cancer, it may be preferable 
to use sequence information from a plurality of different polypeptides expressed by the 
organism or the cancer. Accordingly, in a preferred embodiment, segments from a plurality 

20 of different polypeptides are linked together to form a synthetic polypeptide according to 
the invention. It is preferable in this respect to utilise as many parent polypeptides as 
possible from, or in relation to, a particular source in the construction of the synthetic 
polypeptide. The source of parent polypeptides includes, but is not limited to, a pathogenic 
organism and a cancer. Suitably, at least about 30%, preferably at least 40%, more 

25 preferably at least 50%, even more preferably at least 60%, even more preferably at least 
70%, even more preferably at least 80% and still even more preferably at least 90% of the 
parent polypeptides expressed by the source is used in the construction of the synthetic 
polypeptide. Preferably, parent polypeptides from a virus include, but are not restricted to, 
latent polypeptides, regulatory polypeptides or polypeptides expressed early during their 

30 replication cycle. Suitably, parent polypeptides from a parasite or bacterium include, but 
are not restricted to, secretory polypeptides and polypeptides expressed on the surface of 
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the parasite or bacteria. It is preferred that parent polypeptides from a cancer or tumour are 
cancer specific polypeptides. 

Suitably, hypervariable sequences within the parent polypeptide are excluded 
from the construction of the synthetic polypeptide. 

5 The synthetic polypeptides of the inventions may be prepared by any suitable 

procedure known to those of skill in the art. For example, the polypeptide may be 
synthesised using solution synthesis or solid phase synthesis as described, for example, in 
Chapter 9 of Atherton and Shephard (1989, Solid Phase Peptide Synthesis: A Practical 
Approach. IRL Press, Oxford) and in Roberge et al (1995, Science 269: 202). Syntheses 
10 may employ, for example, either f-butyloxycaibonyl (/-Boc) or 9- 
fluorenylmethyloxycarbonyl (Fmoc) chemistries (see Chapter 9.1, of Coligan et aL, 
CURRENT PROTOCOLS IN PROTEIN SCIENCE, John Wiley & Sons, Inc. 1995-1997; 
Stewart and Young, 1984, Solid Phase Peptide Synthesis, 2nd ed. Pierce Chemical Co., 
Rockford, HI; and Atherton and Shephard, supra). 

15 Alternatively, the polypeptides may be prepared by a procedure including the 

steps of: 

(a) preparing a synthetic construct including a synthetic polynucleotide encoding 
a synthetic polypeptide wherein said synthetic polynucleotide is operably linked to a 
regulatory polynucleotide, wherein said synthetic polypeptide comprises a plurality of 

20 different segments of a parent polypeptide, wherein said segments are linked together 
in a different relationship relative to their linkage in the parent polypeptide; 

(b) introducing the synthetic construct into a suitable host cell; 

(c) culturing the host cell to express the synthetic polypeptide from said synthetic 
construct; and 

25 (d) isolating the synthetic polypeptide. 

The synthetic construct is preferably in the form of an expression vector. For 
example, the expression vector can be a self-replicating extra-chromosomal vector such as 
a plasmid, or a vector that integrates into a host genome. Typically, the regulatory 
polynucleotide may include, but is not limited to, promoter sequences, leader or signal 
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sequences, ribosomal binding sites, transcriptional start and stop sequences, translational 
start and termination sequences, and enhancer or activator sequences. Constitutive or 
inducible promoters as known in the art are contemplated by the invention. The promoters 
may be either naturally occurring promoters, or hybrid promoters that combine elements of 
5 more than one promoter. The regulatory polynucleotide will generally be appropriate for 
the host cell used for expression. Numerous types of appropriate expression vectors and 
suitable regulatory polynucleotides are known in the art for a variety of host cells. 

In a preferred embodiment, the expression vector contains a selectable marker 
gene to allow the selection of transformed host cells. Selection genes are well known in the 
10 art and will vary with the host cell used. 

The expression vector may also include a fusion partner (typically provided by the 
expression vector) so that the synthetic polypeptide of the invention is expressed as a 
fusion polypeptide with said fusion partner. The main advantage of fusion partners is that 
they assist identification and/or purification of said fusion polypeptide. In order to express 
IS said fusion polypeptide, it is necessary to ligate a polynucleotide according to the invention 
into the expression vector so that the translational reading frames of the fusion partner and 
the polynucleotide coincide. 

Well known examples of fusion partners include, but are not limited to, 
glutathione-S-transferase (GST), Fc portion of human IgG, maltose binding protein (MBP) 

20 and hexahistidine (HISe), which are particularly useful for isolation of the fusion 
polypeptide by affinity chromatography. For the purposes of fusion polypeptide 
purification by affinity chromatography, relevant matrices for affinity chromatography are 
glutathione-, amylose-, and nickel- or cobalt-conjugated resins respectively. Many such 
matrices are available in "kit" form, such as the QIAexpress™ system (Qiagen) useful with 

25 (HISe) fusion partners and the Pharmacia GST purification system. In a preferred 
embodiment, the recombinant polynucleotide is expressed in the commercial vector 
pFLAG™. 

Another fusion partner well known in the art is green fluorescent protein (GFP). 
This fusion partner serves as a fluorescent "tag" which allows the fusion polypeptide of the 
30 invention to be identified by fluorescence microscopy or by flow cytometry. The GFP tag 
is useful when assessing subcellular localisation of a fusion polypeptide of the invention, 
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or for isolating cells which express a fusion polypeptide of the invention. Flow cytometric 
methods such as fluorescence activated cell sorting (FACS) are particularly useful in this 
latter application. Preferably, the fusion partners also have protease cleavage sites, such as 
for Factor X», Thrombin and inteins (protein introns), which allow the relevant protease to 
5 partially digest the fusion polypeptide of the invention and thereby liberate the 
recombinant polypeptide of the invention therefrom. The liberated polypeptide can then be 
isolated from the fusion partner by subsequent chromatographic separation. Fusion 
partners according to the invention also include within their scope "epitope tags", which 
are usually short peptide sequences for which a specific antibody is available. Well known 

10 examples of epitope tags for which specific monoclonal antibodies are readily available 
include c-Myc, influenza virus, haemagglutinin and FLAG tags. Alternatively, a fusion 
partner may be provided to promote other forms of immunity. For example, the fusion 
partner may be an antigen-binding molecule that is immuno-interactive with a 
conformational epitope on a target antigen or to a post-translational modification of a 

IS target antigen (e.g, f an antigen-binding molecule that is immuno-interactive with a 
glycosylated target antigen). 

The step of introducing the synthetic construct into the host cell may be effected 
by any suitable method including transfection, and transformation, the choice of which will 
be dependent on the host cell employed. Such methods are well known to those of skill in 
20 the art. 

Synthetic polypeptides of the invention may be produced by culturing a host cell 
transformed with the synthetic construct The conditions appropriate for protein expression 
will vary with the choice of expression vector and the host cell. This is easily ascertained 
by one skilled in the art through routine experimentation. 

25 Suitable host cells for expression may be prokaryotic or eukaryotic. One preferred 

host cell for expression of a polypeptide according to the invention is a bacterium. The 
bacterium used may be Escherichia coli. Alternatively, the host cell may be an insect cell 
such as, for example, SF9 cells that may be utilised with a baculovirus expression system. 

The synthetic polypeptide may be conveniently prepared by a person skilled in the 
30 art using standard protocols as for example described in Sambrook, et a/., MOLECULAR 
CLONING. A LABORATORY MANUAL (Cold Spring Harbor Press, 1989), in particular 
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Sections 16 and 17; Ausubel et al 9 CURRENT PROTOCOLS IN MOLECULAR 
BIOLOGY (John Wiley & Sons, Inc. 1994-1998), in particular Chapters 10 and 16; and 
Coligan et al., CURRENT PROTOCOLS IN PROTEIN SCIENCE (John Wiley & Sons, 
Inc. 1995-1997), in particular Chapters 1, 5 and 6. 

5 The amino acids of the synthetic polypeptide can be any non-naturally occurring 

or any naturally occurring amino acid. Examples of unnatural amino acids and derivatives 
during peptide synthesis include but are not limited to, use of 4-amino butyric acid, 6- 
aminohexanoic acid, 4-amino-3-hydroxy-5-phenylpentanoic acid, 4-amino-3-hydroxy-6- 
methyDieptanoic acid, t-butylglycine, norleucine, norvaline, phenylglycine, ornithine, 
10 sarcosine, 2-thienyl alanine and/or D-isomers of amino acids. A list of unnatural amino 
acids contemplated by the present invention is shown in TABLE C. 



TABLE C 





a-aminobutyric acid 


L-N-methylalanine 


Of-amino-Of-methylbutyrate 


L-N-methylarginine 


aminocyclopropane-carboxylate 


L-N-methylasparagine 


aminoisobutyric acid 


L-N-methylaspartic acid 


aminonorbornyl-carboxylate 


L-N-methylcysteine 


cyclohexylalanine 


L-N-methylglutamine 


cyclopentylalanine 


L-N-methylglutamic acid 


L-N-methylisoleucine 


L-N-methylhistidine 


D-alanine 


L-N-methylleucine 


D-arginine 


L-N-methyllysine 


D-aspartic acid 


L-N-methylmethionine 


D-cysteine 


I^N-methylnorleucine 


D-glutamate 


L-N-methylnorvaline 


D-glutamic acid 


L-N-methylornithine 
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D-histidine 


L-N-methylphenylalanine 


D-isoleucine 


L-N-methylproline 


D-leucine 


L-N-medlyiscrine 


D-lysine 


L-N-methylthreonine 


D-methionine 


L-N-methyltryptophan j 


D-orni thine 


L-N-methyltyrosine 


D-phenylalanine 


I^N-methylvaline 


D-proline 


I^N-methylethylglycine 


D-serine 


L-N-methyl-t-butylglycine 


D- threonine 


L-norleucine 


D-tryptophan 


L-norvaline 


D-tyrosine 


Of-methyl-aminoisobutyrate 


D-valine 


Of-methyl-y-aminobutyrate 


D-ownethylalanine 


a-methylcyclohexylalanine 


D-omethylarginine 


Of-methylcylcopentylalanine 


D-omethylasparagine 


of-methyl-a-napthylalanine 


D-a-methylaspartate 


Of-methylpenicillamine 


D-of-methylcysteine 


N-(4-aminobutyl)glycine 


D-a-methylglutamine 


N-(2-aminoethyl)glycine 


D-a-methylhistidine 


N-(3-aminopropyl)glycine 


D-Of-methylisoleucine 


N-amino-a-methylbutyrate 


D-oe-methylleucine 


Of-napthylalanine 


D-OHnethyllysine 


N-benzylglycine 


D-of-methylmethionine 


N-(2-caibamyIediyl)glycine 


D-a-methylornithiine 


N-(carbamylmethyI)glycine 
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D-o-methylphenylalanine 


N-(2-carboxyethyl)glycine 


D-omethylproline 


N-(carboxymethyl)glycine 


D-omethylserine 


N-cyclobutylglycine 


D-omethylthreonine 


N-cycIoheptylglycine 


D-a-methyltryptophan 


N-cyclohexylglycine 


D-of-methyltyrosine 


N-cyclodecylglycine 


L-a-methylleucine 


L-cwnethyllysine 


L-of-methylmethionine 


L-a-methylnorleucine 


L-omethylnorvatine 


L-OHmethylornithine 


I^Of-methylphenylalanine 


L-of-methylproline 


L-of-methylserine 


L~omethy]threonine 


L-Of-methyltryptophan 


L-a-methyltyrosine 


L-Of-methylvaline 


L-N-methylhomophenylalanine 


N-(N^2,2-diphenylethyl 
carbamylmethyl)glycine 


N-(N-(3,3-diphenylpiopyl 
carbamylmethyl)glycine 


1 -carboxy- 1 -(2,2-diphenyl-ethyl 
amino)cyclopropane 





The invention also contemplates modifying the synthetic polypeptides of the 
invention using ordinary molecular biological techniques so as to alter their resistance to 
proteolytic degradation or to optimise solubility properties or to render them more suitable 
5 as an immunogenic agent. 

3. Preparation of synthetic polynucleotides of the invention 

The invention contemplates synthetic polynucleotides encoding the synthetic 
polypeptides as for example described in Section 2 supra. Polynucleotides encoding 
segments of a parent polypeptide can be produced by any suitable technique. For example, 
10 such polynucleotides can be synthesised de novo using readily available machinery. 
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Sequential synthesis of DNA is described, for example, in U.S. Patent No 4,293,652. 
Instead of de novo synthesis, recombinant techniques may be employed including use of 
restriction endonucleases to cleave a polynucleotide encoding at least a segment of the 
parent polypeptide and use of ligases to ligate together in frame a plurality of cleaved 
5 polynucleotides encoding different segments of the parent polypeptide. Suitable 
recombinant techniques are described for example in the relevant sections of Ausubel, et 
al. (supra) and of Sambrook, et aL 9 (supra) which are incorporated herein by reference. 
Preferably, the synthetic polynucleotide is constructed using splicing by overlapping 
extension (SOEing) as for example described by Horton et ah (1990, Biotechniques 8(5): 
10 528-535; 1995, Mol Biotechnol 3(2): 93-99; and 1997, Methods Mol BioL 67: 141-149). 
However, it should be noted that the present invention is not dependent on, and not 
directed to, any one particular technique for constructing the synthetic construct 

Various modifications to the synthetic polynucleotides may be introduced as a 
means of increasing intracellular stability and half-life. Possible modifications include but 
15 are not limited to the addition of flanking sequences of ribo- or deoxy- nucleotides to the 5' 
and/or 3* ends of the molecule or the use of phosphorothioate or 2* O-methyl rather than 
phosphodiesterase linkages within the oligodeoxyribonucleotide backbone. 

The invention therefore contemplates a method of producing a synthetic 
polynucleotide as broadly described above, comprising linking together in the same 

20 reading frame at least two nucleic acid sequences encoding different segments of a parent 
polypeptide to form a synthetic polynucleotide, which encodes a synthetic polypeptide 
according to the invention. Suitably, nucleic acid sequences encoding at least 10 segments, 
preferably at least 20 segments, more preferably at least 40 segments and more preferably 
at least 100 segments of a parent polypeptide are employed to produce the synthetic 

25 polynucleotide. 

Preferably, the method further comprises selecting segments of the parent 
polypeptide, reverse translating the selected segments and preparing nucleic acid 
sequences encoding the selected segments. It is preferred that the method further comprises 
randomly linking the nucleic acid sequences together to form the synthetic polynucleotide. 
30 The nucleic acid sequences may be oligonucleotides or polynucleotides. 
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Suitably, segments are selected on the basis of size. Additionally, or in the 
alternative, segments are selected such that they have partial sequence identity or 
homology sequence overlap) with one or more other segments. A number of factors 
can influence segment size and sequence overlap as mentioned above. In the case of 
5 sequence overlap, large amounts of duplicated nucleic acid sequences can sometimes result 
in sections of nucleic acid being lost during nucleic acid amplification (e.g., polymerase 
chain reaction, PCR) of such sequences, recombinant plasmid propagation in a bacterH 
host or during amplification of recombinant viruses containing such sequences. 
Accordingly, in a preferred embodiment, nucleic acid sequences encoding segments having 

10 sequence identity or homology with one or more other encoded segments are not linked 
together in an arrangement in which the identical or homologous sequences are contiguous. 
Also, it is preferable that different codons are used to encode a specific amino acid in a 
duplicated region. In this context, an amino acid of a parent polypeptide sequence is 
preferably reverse translated to provide a codon which, in the context of adjacent or local 

15 sequence elements, has a lower propensity of forming an undesirable sequence (e.g. 9 a 
duplicated sequence or a palindromic sequence) that is refractory to the execution of a task 
cloning or sequencing). Alternatively, segments may be selected such that they 
contain a carboxyl terminal leucine residue or such that reverse translated sequences 
encoding the segments contain restriction enzyme sites for convenient splicing of the 

20 reverse translated sequences. 

The method optionally further comprises linking a spacer oligonucleotide 
encoding at least one spacer residue between segment-encoding nucleic acids. Such spacer 
residue(s) may be advantageous in ensuring that epitopes within the segments are 
processed and presented efficiently. Preferably, the spacer oligonucleotide encodes 2 to 3 
25 spacer residues. The spacer residue is suitably a neutral amino acid, which is preferably 
alanine. 

Optionally, the method further comprises linking in the same reading frame as 
other segment-containing nucleic acid sequences at least one variant nucleic acid sequence 
which encodes a variant segment having a homologous but not identical amino acid 
30 sequence relative to other encoded segments. Suitably, the variant segment comprises 
conserved and/or non-conserved amino acid differences relative to one or more other 
encoded segments. Such differences may correspond to polymorphisms as discussed 
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above. In a preferred embodiment, degenerate bases are designed or built in to the at least 
one variant nucleic acid sequence to give rise to all desired homologous sequences. 

When a large number of polymorphisms is intended to be covered, it is preferred 
that multiple synthetic polynucleotides are constructed rather than a single synthetic 
5 polynucleotide, which encodes all variant segments. For example, if there is less than 85% 
homology between polymorphic polypeptides, then it is preferred that more than one 
synthetic polynucleotide ; s constructed. 

Preferably, the method further comprises optimising the codon composition of the 
synthetic polynucleotide such that it is translated efficiently by a host cell. In this regard, it 

10 is well known that the translational efficiency of different codons varies between 
organisms and that such differences in codon usage can be utilised to enhance the level of 
protein expression in a particular organism. In this regard, reference may be made to Seed 
et al (International Application Publication No WO 96/09378) who disclose the 
replacement of existing codons in a parent polynucleotide with synonymous codons to 

IS enhance expression of viral polypeptides in mammalian host cells. Preferably, the first or 
second most frequently used codons are employed for codon optimisation. 

Preferably, gene splicing by overlap extension or "gene SOEing" (supra) is 
employed for the construction of the synthetic polynucleotide which is a PCR-based 
method of recombining DNA sequences without reliance on restriction sites and of directly 

20 generating mutated DNA fragments in vitro. By modifying the sequences incorporated into 
the 5 '-ends of the primers, any pair of PCR products can be made to share a common 
sequence at one end. Under PCR conditions, the common sequence allows strands from 
two different fragments to hybridise to one another, forming an overlap. Extension of this 
overlap by DNA polymerase yields a recombinant molecule. However, a problem with 

25 long synthetic constructs is that mutations generally incorporate into amplified products 
during synthesis. In this instance, it is preferred that resolvase treatment is employed at 
various steps of the synthesis. Resolvases are bacteriophage-encoded endonucleases which 
recognise disruptions or mispairing of double stranded DNA and are primarily used by 
bacteriophages to resolve Holliday junctions (Mizuuchi, 1982; Youil et al. t 1995). For 

30 example, T7 endonuclease I can be employed in synthetic DNA constructions to recognise 
mutations and cleave corrupted dsDNA. The mutated DNA strands are then hybridised to 
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non-mutant or correct DNA sequences, which results in a mispairing of DNA bases. The 
mispaired bases are recognised by the resolvase, which then cleaves the DNA nearby 
leaving only correctly hybridised sequences intact Preferably a thermostable resolvase 
enzyme is employed during splicing or amplification so that errors are not incorporated in 
S downstream synthesis products. 

Synthetic polynucleotides according to the invention can be operably linked to a 
regulatory polynucleotide in the form a synthetic construct as for example described in 
Section 2 supra. Synthetic constructs of the invention have utility inter alia as nucleic acid 
vaccines. The choice of regulatory polynucleotide and synthetic construct will depend on 
10 the intended host 

Exemplary expression vectors for expression of a synthetic polypeptide according 
to the invention include, but are not restricted to, modified Ankara Vaccinia virus as for 
example described by Allen et al (2000, J. Immunol 164(9): 4968-4978), fowlpox virus as 
for example described by Boyle and Coupar (1988, Virus Res. 10: 343-356) and the herpes 
15 simplex amplicons described for example by Fong et al in U.S. Patent No. 6,051,428. 
Alternatively, Adenovirus and Epstein-Barr virus vectors, which are preferably capable of 
accepting large amounts of DNA or RNA sequence information, can be used. 

Preferred promoter sequences that can be utilised for expression of synthetic 
polypeptides include the P7.5 or PE/L promoters as for example disclosed by Kumar and 
20 Boyle. (1990, Virology 179: 151-158), CMV and RSV promoters. 

The synthetic construct optionally further includes a nucleic acid sequence 
encoding an immunostimulatory molecule. The immunostimulatory molecule may be 
fusion partner of the synthetic polypeptide. Alternatively, the immunostimulatory molecule 
may be translated separately from the synthetic polypeptide. Preferably, the 

25 immunostimulatory molecule comprises a general immunostimulatory peptide sequence. 
For example, the immunostimulatory peptide sequence may comprise a domain of an 
invasin protein (Inv) from the bacteria Yersinia spp as for example disclosed by Brett et al 
(1993, Eur. J. Immunol 23: 1608-1614). This immune stimulatory property results from 
the capability of this invasin domain to interact with the 01 integrin molecules present on T 

30 cells, particularly activated immune or memory T cells. A preferred embodiment of the 
invasin domain (Inv) for linkage to a synthetic polypeptide has been previously described 
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in U.S. Pat No. 5,759,551. The said Inv domain has the sequence: Thr-Ala-Lys-Ser-Lys- 
Lys-Phe-Pro-Ser-Tyr-Thr-Ala-Thr-Tyr-Gln-Phe [SEQ ID NO; 1467] or is an immune 
stimulatory homologue thereof from the corresponding region in another Yersinia species 
invasin protein. Such homologues thus may contain substitutions, deletions or insertions of 
5 amino acid residues to accommodate strain to strain variation, provided that the 
homologues retain immune stimulatory properties. The general immunostimulatory 
sequence may optionally be linked to the synthetic polypeptide by a spacer sequence. 

In an alternate embodiment, the immunostimulatory molecule may comprise an 
immunostimulatory membrane or soluble molecule, which is suitably a T cell co- 
10 stimulatory molecule. Preferably, the T cell co-stimulatory molecule is a B7 molecule or a 
biologically active fragment thereof, or a variant or derivative of these. The B7 molecule 
includes, but is not restricted to, B7-1 and B7-2. Preferably, the B7 molecule is B7-1. 
Alternatively, the T cell co-stimulatory molecule may be an ICAM molecule such as 
ICAM-1 andICAM-2. 

15 In another embodiment, the immunostimulatory molecule can be a cytokine, 

which includes, but is not restricted to, an interleukin, a lymphokine, tumour necrosis 
factor and an interferon. Alternatively, the immunostimulatory molecule may comprise an 
immunomodulatory oligonucleotide as for example disclosed by Krieg in U.S. Patent No. 
6,008,200. 

20 Suitably, the size of the synthetic polynucleotide does not exceed the ability of 

host cells to transcribe, translate or proteolytically process and present epitopes to the 
immune system. Practitioners in the art will also recognise that the size of the synthetic 
polynucleotide can impact on the capacity of an expression vector to express the synthetic 
polynucleotide in a host cell. In this connection, it is known that the efficacy of DNA 

25 vaccination reduces with expression vectors greater that 20-kb. In such situations it is 
preferred that a larger number of smaller synthetic constructs is utilised rather than a single 
large synthetic construct. 

4. Immunopotentiating compositions 

The invention also contemplates a composition, comprising an 
30 immunopotentiating agent selected from the group consisting of a synthetic polypeptide as 
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described in Section 2, and a synthetic polynucleotide or a synthetic construct as described 
in Section 3, together with a pharmaceutical^ acceptable carrier. One or more 
immunopotentiating agents can be used as actives in the preparation of 
immunopotentiating compositions. Such preparation uses routine methods known to 
5 persons skilled in the art. Typically, such compositions are prepared as injectables, either 
as liquid solutions or suspensions; solid forms suitable for solution in, or suspension in, 
liquid prior to injection may also be prepared The preparation may also be emulsified. The 
active immunogenic ingredients are often mixed with excipients that are pharmaceutical^ 
acceptable and compatible with the active ingredient Suitable excipients are, for example, 

10 water, saline, dextrose, glycerol, ethanol, or the like and combinations thereof. In addition, 
if desired, the vaccine may contain minor amounts of auxiliary substances such as wetting 
or emulsifying agents, pH buffering agents, and/or adjuvants that enhance the effectiveness 
of the vaccine. Examples of adjuvants which may be effective include but are not limited 
to: aluminium hydroxide, N-acetyl-muramyl-L-threonyl-D-isoglutamine (thur-MDP), N- 

15 acetyl-nor-muramyl-L-alanyl-D-isoglutamine (CGP 11637, referred to as nor-MDP), N- 
acetylmuram)1-L-alanyl-D-isoglutaminyl-L-alanine-2-{l , -2 , -dipalmitoyl-sn-glycen>-3- 
hydroxyphosphor>1oxy)-ethylamine (CGP 1983 A, referred to as MTP-PE), and RIBI, 
which contains three components extracted from bacteria, monophosphoryl lipid A, 
trehalose dimycolate and cell wall skeleton (MPL+TDM+CWS) in a 2% squalene/Tween 

20 80 emulsion. For example, the effectiveness of an adjuvant may be determined by 
measuring the amount of antibodies resulting from the administration of the composition, 
wherein those antibodies are directed against one or more antigens presented by the treated 
cells of the composition. 

The immunopotentiating agents may be formulated into a composition as neutral 
25 or salt forms. Pharmaceutical^ acceptable salts include the acid addition salts (formed 
with free amino groups of the peptide) and which are formed with inorganic acids such as, 
for example, hydrochloric or phosphoric acids, or such organic acids such as acetic, oxalic, 
tartaric, maleic, and the like. Salts formed with the free carboxyl groups may also be 
derived from inorganic basis such as, for example, sodium, potassium, ammonium, 
30 calcium, or ferric hydroxides, and such organic basis as isopropylamine, trimethylamine, 
2-ethylamino ethanol, histidine, procaine, and the like. 
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If desired, devices or compositions containing the immunopotentiating agents 
suitable for sustained or intermittent release could be, in effect, implanted in the body or 
topically applied thereto for the relatively slow release of such materials into the body. 

The compositions are conventionally administered parenterally, by injection, for 
5 example, either subcutaneously or intramuscularly. Additional formulations which are 
suitable for other modes of administration include suppositories and, in some cases, oral 
formulations. For suppositories, traditional binders and carriers may include, for example, 
polyalkylene glycols or triglycerides; such suppositories may be formed from mixtures 
containing the active ingredient in the range of 0.5% to 10%, preferably l%-2%. Oral 
10 formulations include such normally employed excipients as, for example, pharmaceutical 
grades of mannitol, lactose, starch, magnesium carbonate, and the like. These compositions 
take the form of solutions, suspensions, tablets, pills, capsules, sustained release 
formulations or powders and contain 10%-95% of active ingredient, preferably 25%-70%. 

Administration of the gene therapy construct to said mammal, preferably a 
15 human, may include delivery via direct oral intake, systemic injection, or delivery to 
selected tissue(s) or cells, or indirectly via delivery to cells isolated from the mammal or a 
compatible donor. An example of the latter approach would be stem cell therapy, wherein 
isolated stem cells having potential for growth and differentiation are transfected with the 
vector comprising the Soxl8 nucleic acid. The stem cells are cultured for a period and then 
20 transferred to the mammal being treated. 

With regard to nucleic acid based compositions, all modes of delivery of such 
compositions are contemplated by the present invention. Delivery of these compositions to 
cells or tissues of an animal may be facilitated by microprojectile bombardment, liposome 
mediated transfection (e.g. 9 lipofectin or lipofe^ftiiline), electroporation, calcium 

25 phosphate or DEAE-dextran-mediated transfection, for example. In an alternate 
embodiment, a synthetic construct may be used as a therapeutic or prophylactic 
composition in the form of a "naked DNA" composition as is known in the art. A 
discussion of suitable delivery methods may be found in Chapter 9 of CURRENT 
PROTOCOLS IN MOLECULAR BIOLOGY (Eds. Ausubel et ai; John Wiley & Sons 

30 Inc., 1997 Edition) or on the Internet site DNAvaccine.com. The compositions may be 
administered by intradermal (e.g., using panjet™ delivery) or intramuscular routes. 
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The step of introducing the synthetic polynucleotide into a target cell will differ 
depending on the intended use and species, and can involve one or more of non-viral and 
viral vectors, cationic liposomes, retroviruses, and adenoviruses such as, for example, 
described in Mulligan, R.C., (1993 Science 260 926-932) which is hereby incorporated by 
5 reference. Such methods can include, for example: 

A. Local application of the synthetic polynucleotide by injection (Wolff et al. 9 1990, 
Science 247 1465-1468, which is hereby incorporated by reference), surgical 
implantation, instillation or any other means. This method can also be used in 
combination with local application by injection, surgical implantation, instillation or 
10 any other means, of cells responsive to the protein encoded by the synthetic 
polynucleotide so as to increase the effectiveness of that treatment. This method can 
also be used in combination with local application by injection, surgical implantation, 
instillation or any other means, of another factor or factors required for the activity of 
said protein. 

15 B. General systemic delivery by injection of DNA, (Calabretta et al., 1993, Cancer Treat. 
Rev. 19 169-179, which is incorporated herein by reference), or RNA, alone or in 
combination with liposomes (Zhu et al, 1993, Science 261 209-212, which is 
incorporated herein by reference), viral capsids or nanoparticles (Bertling et a/., 1991, 
Biotech. Appl. Biochem. 13 390-405, which is incorporated herein by reference) or any 

20 other mediator of delivery. Improved targeting might be achieved by linking the 
synthetic polynucleotide to a targeting molecule (the so-called "magic bullet* * approach 
employing, for example, an antibody), or by local application by injection, surgical 
implantation or any other means, of another factor or factors required for the activity of 
the protein encoding said synthetic polynucleotide , or of cells responsive to said 

25 protein. 

C. Injection or implantation or delivery by any means, of cells that have been modified ex 
vivo by transfection (for example, in the presence of calcium phosphate: Chen et a/., 
1987, Mole. Cell Biochem. 7 2745-2752, or of cationic lipids and polyamines: Rose et 
al y 1991, BioTech. 10 520-525, which articles are incorporated herein by reference), 
30 infection, injection, electroporation (Shigekawa et al. y 1988, BioTech. 6 742-751, 
which is incorporated herein by reference) or any other way so as to increase the 
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expression of said synthetic polynucleotide in those cells. The modification can be 
mediated by plasmid, bacteriophage, cosmid, viral (such as adenoviral or retroviral; 
Mulligan, 1993, Science 260 926-932; Miller, 1992, Nature 357 455-460; Salmons et 
al. 9 1993, Hum. Gen. Ther. 4 129-141, which articles are incorporated herein by 
5 reference) or other vectors, or other agents of modification such as liposomes (Zhu et 
al., 1993, Science 261 209-212, which is incorporated herein by reference), viral 
capsids or nanoparticles (Rertling et ai 9 1991, Biotech. Appl. Biochem. 13 390-405, 
which is incorporated herein by reference), or any other mediator of modification. The 
use of cells as a delivery vehicle for genes or gene products has been described by Barr 
10 et ai 9 1991, Science 254 1507-1512 and by Dhawan et al, 1991, Science 254 1509- 
1512, which articles are incorporated herein by reference. Treated cells can be 

delivered in combination with any nutrient, growth factor, matrix or other agent that 

t 

will promote their survival in the treated subject. 

Also encapsulated by the present invention is a method for treatment and/or 
1 5 prophylaxis of a disease or condition, comprising administering to a patient in need of such 
treatment a therapeutically effective amount of a composition as broadly described above. 
The disease or condition may be caused by a pathogenic organism or a cancer as for 
example described above. 

In a preferred embodiment, the immunopotentiating composition of the invention 
20 is suitable for treatment of, or prophylaxis against, a cancer. Cancers which could be 
suitably treated in accordance with the practices of this invention include cancers of the 
lung, breast, ovary, cervix, colon, head and neck, pancreas, prostate, stomach, bladder, 
kidney, bone liver, oesophagus, brain, testicle, uterus, melanoma and the various leukeniias 
and lymphomas. 

25 In an alternate embodiment, the immunopotentiating composition is suitable for 

treatment of, or prophylaxis against, a viral, bacterial or parasitic infection. Viral infections 
contemplated by the present invention include, but are not restricted to, infections caused 
by HIV, Hepatitis, Influenza, Japanese encephalitis virus, Epstein-Barr virus and 
respiratory syncytial virus. Bacterial infections include, but are not restricted to, those 

30 caused by Neisseria species, Meningococcal species, Haemophilus species Salmonella 
species, Streptococcal species, Legionella species and Mycobacterium species. Parasitic 
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infections encompassed by the invention include, but are not restricted to, those caused by 
Plasmodium species, Schistosoma species, Leishmania species, Trypanosoma species, 
Toxoplasma species and Giardia species. 

The above compositions or vaccines may be administered in a manner compatible 
5 with the dosage formulation, and in such amount as is therapeutically effective to alleviate 
patients from the disease or condition or as is prophylactically effective to prevent 
incidence of the disease or condition in the patient. The dose administered to a patient, in 
the context of the present invention, should be sufficient to effect a beneficial response in a 
patient over time such as a reduction or cessation of blood loss. The quantity of the 

10 composition or vaccine to be administered may depend on the subject to be treated 
inclusive of the age, sex, weight and general health condition thereof. In this regard, 
precise amounts of the composition or vaccine for administration will depend on the 
judgement of the practitioner, hi determining the effective amount of the composition or 
vaccine to be administered in the treatment of a disease or condition, the physician may 

IS evaluate the progression of the disease or condition over time. In any event, those of skill 
in the art may readily determine suitable dosages of the composition or vaccine of the 
invention. 

In a preferred embodiment, DNA-based immunopotentiating agent (eg., 100 /ig) 
is delivered intradermally into a patient at day 1 and at week 8 to prime the patient. A 
20 recombinant poxvirus (eg., at 10 7 pfu/mL) from which substantially die same 
immunopotentiating agent can be expressed is then delivered intradermally as a booster at 
weeks 16 and 24, respectively. 

The effectiveness of the immunisation may be assessed using any suitable 
technique. For example, CTL lysis assays may be employed using stimulated splenocytes 

25 or peripheral blood mononuclear cells (PBMC) on peptide coated or recombinant virus 
infected cells using 5, Cr labelled target cells. Such assays can be performed using for 
example primate, mouse or human cells (Allen et aL, 2000, J. Immunol 164(9): 4968-4978 
also Woodberry et aL, infra). Alternatively, the efficacy of the immunisation may be 
monitored using one or more techniques including, but not limited to, HLA class I 

30 Tetramer staining - of both fresh and stimulated PBMCs (see for example Allen et aL, 
supra), proliferation assays (Allen et aL, supra), Elispot™ Assays and intracellular INF- 
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gamma staining (Allen et al 9 supra), ELISA Assays - for linear B cell responses; and 
Western blots of cell sample expressing the synthetic polynucleotides. 

5. Computer related embodiments 

The design or construction of a synthetic polypeptide sequence or a synthetic 
5 polynucleotide sequence according to the invention is suitably facilitated with the 
assistance of a computer programmed with software, which inter alia fragments a parent 
sequence into fragments, and which links those fragments together in a different 
relationship relative to their linkage in the parent sequence. The ready use of a parent 
sequence for the construction of a desired synthetic molecule according to the invention 
10 requires that it be stored in a computer-readable format Thus, in accordance with the 
present invention, sequence data relating to a parent molecule (e.g., a parent polypeptide) 
is stored in a machine-readable storage medium, which is capable of processing the data to 
fragment the sequence of the parent molecule into fragments and to link together the 
fragments in a different relationship relative to their linkage in the parent molecule. 

15 Therefore, another embodiment of the present invention provides a machine- 

readable data storage medium, comprising a data storage material encoded with machine 
readable data which, when used by a machine programmed with instructions for using said 
data, fragments a parent sequence into fragments, and links those fragments together in a 
different relationship relative to their linkage in the parent sequence. In a preferred 

20 embodiment of this type, a machine-readable data storage medium is provided that is 
capable of reverse translating the sequence of a respective fragment to provide a nucleic 
acid sequence encoding the fragment and to link together in the same reading frame each 
of the nucleic acid sequences to provide a polynucleotide sequence that codes for a 
polypeptide sequence in which said fragments are linked together in a different relationship 

25 relative to their linkage in a parent polypeptide sequence. 

In another embodiment, the invention encompasses a computer for designing the 
sequence of a synthetic polypeptide and/or a synthetic polynucleotide of the invention, 
wherein the computer comprises wherein said computer comprises: (a) a machine readable 
data storage medium comprising a data storage material encoded with machine readable 
30 data, wherein said machine readable data comprises the sequence of a parent polypeptide; 
(b) a working memory for storing instructions for processing said machine-readable data; 
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(c) a central-processing unit coupled to said working memory and to said machine-readable 
data storage medium, for processing said machine-readable data into said synthetic 
polypeptide sequence and/or said synthetic polynucleotide; and (d) an output hardware 
coupled to said central processing unit, for receiving said synthetic polypeptide sequence 
5 and/or said synthetic polynucleotide. 

In yet another embodiment, the invention contemplates a computer program 
product for designing the sequence of a synthetic polynucleotide of the invention, 
comprising code that receives as input the sequence of a parent polypeptide, code that 
fragments the sequence of the parent polypeptide into fragments, code that reverse 

10 translates the sequence of a respective fragment to provide a nucleic acid sequence 
encoding the fragment, code that links together in the same reading frame each said nucleic 
acid sequence to provide a polynucleotide sequence that codes for a polypeptide sequence 
in which said fragments are linked together in a different relationship relative to their 
linkage in the parent polypeptide sequence, and a computer readable medium that stores 

15 the codes. 

A version of these embodiments is presented in Figure 23, which shows a system 
10 including a computer 11 comprising a central processing unit ("CPU 3 ) 20, a working 
memory 22 which may be, e.g., RAM (random-access memory) or "core" memory, mass 
storage memory 24 (such as one or more disk drives or CD-ROM drives), one or more 
20 cathode-ray tube ("CRT") display terminals 26, one or more keyboards 28, one or more 
input lines 30, and one or more output lines 40, all of which are interconnected by a 
conventional bidirectional system bus SO. 

Input hardware 36, coupled to computer 11 by input lines 30, may be 
implemented in a variety of ways. For example, machine-readable data of this invention 
25 may be inputted via the use of a modem or modems 32 connected by a telephone line or 
dedicated data line 34. Alternatively or additionally, the input hardware 36 may comprise 
CD. Alternatively, ROM drives or disk drives 24 in conjunction with display terminal 26, 
keyboard 28 may also be used as an input device. 

Output hardware 46, coupled to computer 1 1 by output lines 40, may similarly be 
30 implemented by conventional devices. By way of example, output hardware 46 may 
include CRT display terminal 26 for displaying a synthetic polynucleotide sequence or a 
synthetic polypeptide sequence as described herein. Output hardware might also include a 
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printer 42, so that hard copy output may be produced, or a disk drive 24, to store system 
output for later use. 

In operation, CPU 20 coordinates the use of the various input and output devices 
36,46 coordinates data accesses from mass storage 24 and accesses to and from working 
5 memory 22, and determines the sequence of data processing steps. A number of programs 
may be used to process the machine readable data of this invention. Exemplary programs 
may use for example the steps outlined in the flow diagram iliiistrated in Figure 24. 
Broadly, these steps include (1) inputting at least one parent polypeptide sequence; (2) 
optionally adding to alanine spacers at the ends of each polypeptide sequence; (3) 

10 fragmenting the polypeptide sequences into fragments (e.g., 30 amino acids long), which 
are preferably overlapping (eg., by 15 amino acids); (4) reverse translating the fragment to 
provide a nucleic acid sequence for each fragment and preferably using for the reverse 
translation first and second most translationally efficient codons for a cell type, wherein the 
codons are preferably alternated out of frame with each other in the overlaps of 

15 consecutive fragments; (5) randomly rearranging the fragments; (6) checking whether 
rearranged fragments recreate at least a portion of a parent polypeptide sequence; (7) 
repeating randomly rearranging the fragments when rearranged fragments recreate said at 
least a portion; or otherwise (8) linking the rearranged fragments together to produce a 
synthetic polypeptide sequence and/or a synthetic polynucleotide sequence; and (9) 

20 outputting said synthetic polypeptide sequence and/or a synthetic polynucleotide sequence. 
An example of an algorithm which uses inter alia the aforementioned steps is shown in 
Figure 25. By way of example, this algorithm has been used for the design of synthetic 
polynucleotides and synthetic polypeptides according to the present invention for Hepatitis 
C la and for melanoma, as illustrated in Figures 26 and 27. 

25 Figure 28 shows a cross section of a magnetic data storage medium 100 which can 

be encoded with machine readable data, or set of instructions, for designing a synthetic 
molecule of the invention, which can be carried out by a system such as system 10 of 
Figure 23. Medium 100 can be a conventional floppy diskette or hard disk, having a 
suitable substrate 101, which may be conventional, and a suitable coating 102, which may 

30 be conventional, on one or both sides, containing magnetic domains (not visible) whose 
polarity or orientation can be altered magnetically. Medium 100 may also have an opening 
(not shown) for receiving the spindle of a disk drive or other data storage device 24. The 
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magnetic domains of coating 102 of medium 100 are polarised or oriented so as to encode 
in manner which may be conventional, machine readable data such as that described 
herein, for execution by a system such as system 10 of Figure 23. 

Figure 29 shows a cross section of an optically readable data storage medium 110 
5 which also can be encoded with such a machine-readable data, or set of instructions, for 
designing a synthetic molecule of the invention, which can be carried out by a system such 
as system 10 of Figure 23. Medium 110 can be a conventional compact disk read only 
memory (CD-ROM) or a rewritable medium such as a magneto-optical disk, which is 
optically readable and magneto-optically writable. Medium 100 preferably has a suitable 
10 substrate 111, which may be conventional, and a suitable coating 112, which may be 
conventional, usually of one side of substrate 111. 

In the case of CD-ROM, as is well known, coating 112 is reflective and is 
impressed with a plurality of pits 113 to encode the machine-readable data. The 
arrangement of pits is read by reflecting laser light off the surface of coating 112. A 
IS protective coating 114, which preferably is substantially transparent, is provided on top of 
coating 112. 

In the case of a magneto-optical disk, as is well known, coating 1 12 has no pits 
113, but has a plurality of magnetic domains whose polarity or orientation can be changed 
magnetically when heated above a certain temperature, as by a laser (not shown). The 
20 orientation of the domains can be read by measuring the polarisation of laser light reflected 
from coating 112. The arrangement of the domains encodes the data as described above. 

In order that the invention may be readily understood and put into practical effect, 
particular preferred non-limiting embodiments will now be described as follows. 
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EXAMPLES 
EXAMPLE 1 

Preparation of an HTVSavine 

Experimental Protocol 
5 Plasmids 

The plasmid pDNAVacc is ampicillin resistant and contains an expression 
cassette comprising a CMV promoter and enhancer, a synthetic intron, a multiple cloning 
site (MCS) and a SV40poly A signal sequence (Thomson et al. y 1998). The plasmid 
pTK7.5 and contains a selection cassette, a pox virus 7.5 early/late promoter and a MCS 
1 0 flanked on either side by Vaccinia virus TK gene sequences. 

Recombinant Vaccinia Viruses 

Recombinant Vaccinia viruses expressing the gag, env (US) and pol (LAI) genes 
of HIV- 1 were used as previously described and denoted W-GAG, W-POL, W-ENV 
(Woodberry et al., 1999; Kent et al 9 1998). 

1 5 Marker Rescue Recombination 

Recombinant Vaccinia viruses containing Savine constructs were generated by 
marker rescue recombination, using protocols described previously (Boyle et al. 9 1985). 
Plaque purified viruses were tested for the TK phenotype and for the appropriate genome 
arrangement by Southern blot and PCR. 

20 Oligonucleotides 

Oligonucleotides 50 nmol scale and desalted were purchased from Life 
Technologies. Short oligonucleotides were resuspended in 100 \iL of water, their 
concentration determined, then diluted to 20 jiM for use in PCR or sequencing reactions. 
Long oligonucleotides for splicing reactions were denatured for 5 minutes at 94°C in 
25 20 (iL of formamide loading buffer then 0.5 \iL gel purified on a 6% polyactylamide gel. 
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Gel slices containing full-length oligonucleotides were visualised with ethidium bromide, 
excised, placed in Eppendorf™ tubes, combined with 200 jiL of water before being 
crushed using the plunger of a 1 mL syringe. Before being used in splicing reactions the 
crushed gel was resuspended in an appropriate volume of buffer and 1*2 nL of the 
5 resuspendate used directly in the splicing reactions. 

Sequencing 

Sequencing was performed using Dye terminator sequencing reactions and 
analyzed by the Biomedical Resource Facility at the John Curtin School of Medical 
Research using an ABI automated sequencer. 

10 Restimulation of Lymphocytes from HIV Infected Patients 

Two pools of recombinant Vaccinia viruses containing W-AC1 + W-BC1 (Pool 
1) or W-AC2 + W-BC2 + W-CC2 (Pool 2) were used to restimulate lymphocytes from 
the blood samples of HIV-infected patients. Briefly CTL lines were generated from HIV- 
infected donor PBMC. A fifth of the total PBMC were infected with either Pool 1 or Pool 2 

IS Vaccinia viruses then added back to the original cell suspension. The infected cell 
suspension was then cultured with IL-7 for 1 week. 

CTL Assays 

Restimulated PBMCs were used as effectors in a standard 5, Cr-release CTL assay. 
Targets were autologous EBV-transformed lymphoblastoid cell lines (LCLs) infected with 
20 the following viruses : Pool 1, Pool 2,W-GAG, W-POL or W-ENV. Assay controls 
included uninfected targets, targets infected with W-lacZ (virus control) and K562 cells. 

Results 

HIV Savine Design 

A main goal of the Savine strategy is to include as much protein sequence 
25 information from a pathogen or cancer as possible in such a way that potential T cell 
epitopes remain intact and so that the vaccine or therapy is extremely safe. An HIV Savine 
is described herein not only to compare this strategy to other strategies but also, to produce 
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an HIV vaccine that would provide the maximum possible population coverage as well as 
catering for the major HIV clades. 

A number of design criteria was first determined to exploit the many advantages 
of using a synthetic approach. One advantage is that it is possible to use consensus protein 
5 sequences to design these vaccines. Using consensus sequences for a highly variable virus 
like HTV should provide better vaccine coverage because individual viral isolate sequences 
may have lost epitopes which induce CTL against the majority of other viral isolates. Thus, 
using the consensus sequences of each HIV clade rather than individual isolate sequences 
should provide better vaccine coverage. Taking this one step further, a consensus sequence 
10 that covers all HIV clades should theoretically provide better coverage than using just the 
consensus sequences for individual clades. Before designing such a sequence however, it 
was decided that a more appropriate and focussed HTV vaccine might be constructed if the 
various clades were first ranked according to their relative importance. To establish such a 
ranking the following issues were considered, current prevalence of each clade, the rate at 
IS which each clade is increasing and the capacity of various regions of the world to cope 
with the HTV pandemic (Figures 1 and 2). These criteria produced the following ranking, 
Clade E > clade A > clade C > clade B > clade D > other clades. Clades E and A were 
considered to almost equal since they are very similar except in their envelope protein 
sequences, which differ considerably. 

Another advantage of synthesising a designed sequence is that it is possible to 
incorporate degenerate sequences into their design. In the case of HIV, this means that 
more than one amino acid can be included at various positions to improve the ability of the 
vaccine to cater for the various HIV clades and isolates. Coverage is improved because 
mutations in different HIV clades and also in individual isolate sequences, while mostly 
destroying specific T cell epitopes, can result in the formation of new potentially useful 
epitopes nearby (Goulder et al. 7 1997). Incorporating degenerate amino acid sequences, 
however, also means that more than one construct must be made and mixed together. The 
number of constructs required depends on the frequency with which mutations are 
incorporated into the design. While this approach requires the construction of additional 
constructs, these constructs can be prepared from the same set of degenerate long 
oligonucleotides, significantly reducing the cost of providing such considerable interclade 
coverage. 
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A set of degeneracy rules was developed for the incorporation of amino acid 
mutations into the design which meant that a maximum of eight constructs would be 
required so that theoretically all combinations were present, as follows: 1) Two amino 
acids at three positions (or less) within any group of nine amino acids (i.e., present in a 
5 CTL epitope); 2) Three amino acids at one position and two at another (or not) within any 
group of nine amino acids; 3) Four amino acids at one position and two at another (or not) 
within any group of nine amino acids. The reason why these rules were applied to nL— 
amino acids (the average CTL epitope size) and not to larger stretches of amino acid 
sequence to cater for class II restricted epitopes, is because class H-restricted epitopes 
10 generally have a core sequence of nine amino acids in the middle which bind specifically 
to class II MHC molecules with the extra flanking sequences stabilising binding, by 
associating with either side of class II MHC antigens in a largely sequence independent 
manner (Brown et aL 9 1993). 

Using the HIV clade ranking described above, the amino acid degeneracy rules 
IS and in some situations the similarity between amino acids, a degenerate consensus protein 
sequence was designed for each HIV protein using the consensus protein sequences for 
each HIV clade compiled by the Los Alamos HIV sequence database (Figures 3-11) (HIV 
Molecular Immunology Database, 1997). It is important to note that in some situations the 
order with which each of the above design criteria was applied was altered. Each time this 
20 was done the primary goal however was to increase the ability of the Savine to cater for 
interclade differences. Two isolate sequences, GenBank accession U51189 and U46016, 
for clade E and clade C, respectively, were used when a consensus sequence for some HIV 
proteins from these two clades was unavailable (Gao et al., 1996; Salminen et aL, 1996). 
The design of a consensus sequence for the hypervariable regions of the HIV envelope 
25 protein and in some cases between these regions (hypervariable regions 1-2 and 3-5) was 
difficult and so these regions were excluded from the vaccine design. 

Once a degenerate consensus sequence was designed for each HIV protein, an 
approach was then determined for incorporating all the protein sequences safely into the 
vaccine. One convenient approach to ensure that a vaccine will be safe is to systematically 
30 fragment and randomly rearrange the protein sequences together thus abrogating or 
otherwise altering their structure and function. The protein sequences still have to be 
immunologically functional however, meaning that the process used to fragment the 
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sequences should not destroy potential epitopes. To decide on the best approach for 
systematically fragmenting protein sequences, the main criteria used was the size of T 
epitopes and their processing requirements. Class I-restricted T cell epitopes are 8-10 
amino acids long and generally require 2-3 natural flanking amino acids to ensure their 
5 efficient processing and presentation if placed next to unnatural flanking residues (Del Val 
et aL, 1991; Thomson et al y 1995). Class U-restricted T cell epitopes range between 12-25 
amino acids long and do appear to require natural flanking residues for processing 
however, it is difficult to rule out a role for natural flanking residues in all cases due to the 
complexity of their processing pathways (Thomson et aL 9 1998). Also class H-restricted 

10 epitopes despite being larger than CTL epitopes generally have a core sequence of 9-10 
amino acids, which binds to MHC molecules in a sequence specific fashion. Thus, based 
on current knowledge, it was decided that an advantageous approach was to overlap the 
fragments by at least 15 amino acids to ensure that potential epitopes which might lie 
across fragment boundaries are not lost and to ensure that CTL epitopes near fragment 

15 boundaries, that are placed beside or near inhibitory amino acids in adjacent fragments, are 
processed efficiently. In deciding the optimal fragment size, the main criteria used were 
that size had to be small enough to cause the maximum disruption to the structure and 
function of proteins but large enough to cover the sequence information as efficiently as 
possible without any further unnecessary duplication. Based on these criteria the fragments 

20 would be twice the overlap size, in this case 30 amino acids long. 

The designed degenerate protein sequences were then separated into fragments 30 
amino acid long and overlapping by fifteen amino acids. Two alanine amino acids were 
also added to the start and end of the first and last fragment for each protein or envelop 
protein segment to ensure these fragments were not placed directly adjacent to amino acids 

25 capable of blocking epitope processing (Del Val et aL, 1991). The next step was to reverse 
translate each protein sequence back into DNA. Duplicating DNA sequences was avoided 
when constructing DNA sequences encoding a tandem repeat of identical or homologous 
amino acid sequences to maximise expression of the Savine. In this regard, the first and 
second most commonly used mammalian codons (shown in Figure 12) were assigned to 

30 amino acids in these repeat regions, wherein a first codon was used to encode an amin 
acid in one of the repeated sequences and wherein a second but synonymous codon was 
used for the other repeated sequence (eg., see the gag HIV protein in Figure 13). To cater 
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for the designed amino acid mutations more than one base was assigned to some positions 
using the IUPAC DNA codes without exceeding more than three base variations (eight 
possible combinations) in any group of 27 bases (Figure 12). Where a particular 
combination of amino acids could not be incorporated, because too many degenerate bases 
5 would be required, some or all of the amino acid degeneracy was removed according to the 
protein consensus design rules outlined above. Also the degenerate codons were checked 
to determine if they could encode a stop codon, if stop codons could not be avoided then 
the amino acid degeneracy was also simplified again according to the protein consensus 
design rules outlined above. 

10 The designed DNA segments were then scrambled randomly and joined to create 

twenty-two subcassettes approximately 840 bp in size. Extra DNA sequences incorporating 
sites for one of the cohesive restriction enzymes Xbal, Spe\ AvrU or Nhel and 3 additional 
base pairs (to cater for premature Taq polymerase termination) were then added to each 
end of each subcassette (Figure 14). Some of these extra DNA sequences also contained, 

15 the cohesive restriction sites for Sail or Xhol, Kozak signal sequences and start or stop 
codons to enable the subcassettes to be joined and expressed either as three large cassettes 
or one full length protein (Figures 14 and IS). 

In designing the HIV Savine one issue that required investigation was whether 
such a large DNA molecule would be fiilly expressed and whether epitopes encoded near 

20 the end of the molecule would be efficiently presented to the immune system. The 
inventors also wished to show that mixing two or more degenerate Savine constructs 
together could induce T cell responses that recognise mutated sequences. To examine both 
issues DNA coding for a degenerate murine influenza nucleoprotein CTL epitope, NP365- 
373, which differs by two amino acids at positions 71 and 72 in influenza strain A/PR/8/34 

25 compared to the A/NT/60/68strain and restricted by H2-Db, was inserted before the last 
stop codon at the end of the HIV Savine design (Figure 15). An important and unusual 
characteristic of both of these naturally occurring NP365-373 sequences, which enabled 
the present inventors to examine the effectiveness of incorporating mutated sequences, is 
that they generate CTL responses which do not cross react with the alternate sequence 

30 (Townsend et aL, 1986). This is an unusual characteristic because epitopes not destroyed 
by mutation usually induce CTL responses that cross-react. 
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Up to ten long oligonucleotides up to 100 bases long and two short amplification 
oligonucleotides were synthesised to enable construction of each subcassette (Life 
Technologies). In designing each oligonucleotide the 3' end and in most cases also the 5' 
end had to be either a V or a c g* to ensure efficient extension during PCR splicing. The 
5 overlap region for each long oligonucleotide was designed to be at least 16 bp with 
approximately 50% G/C content. Also oligonucleotide overlaps were not placed where 
degenerate DNA bases coded for degenerate amino acids to avoid splicing difficulties 
later. Where this was too difficult some degenerate bases were removed according to the 
protein consensus design rules outlined above and indicated in Figure 12. Figure 16 shows 
1 0 an example of the oligonucleotides design for each subcassette. 

Construction of the HIV Savine 

Five of each group of ten designed oligonucleotides were spliced together using 
stepwise asymmetric PCR (Sandhu et aL, 1992) and Splicing by Overlap Extension 
(SOEing) (Figure 17a). Each subcassette was then PCR amplified, cloned into 

15 pBluescript™ II KS~ using BamHUEcoRI and 16 individual clones sequenced. Mutations, 
deletions and insertions were present in the large majority of the clones for each 
subcassette, despite acrylamide gel purification of the long oligonucleotides. In order to 
construct a functional Savine with minimal mutations, two clones for each subcassette with 
no insertions or deletions and hence a complete open reading frame and with minimal 

20 numbers of non-designed mutations, were selected from the sixteen available. The 
subcassettes were then excised from their plasmids and joined by stepwise PCR-amplified 
ligation using the polymerase blend Elongase™ (Life Technology), T4 DNA ligase and the 
cohesive restriction enzymes XbaVSpeVAvrTUNhel, to generate two copies of cassettes A, 
B and C as outlined in Figure 14 and shown in Figure 17b. Predicted sequences for these 

25 cassettes are shown in Figure 30. Each cassette was then reamplified by PCR with 
Elongase™, cloned into pBluescript™ II KS~ and 3 of the resulting plasmid clones 
sequenced using 12 of the 36 sequencing primers designed to cover the full length 
construct Clones with minimal or no further mutations were selected for transfer into 
plasmids for DNA vaccination or used to make recombinant poxviruses. A summary of the 

30 number of designed and non-designed mutations in each Savine construct is presented in 
Table 1. 
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Summary of mutations 




Cassette A 


1896 


249 


124 


107 


Cassette B 


1184 


260 


130 


124 


Cassette C 


1969 


276 


138 


121 


Full length 


5742 


785 


392 


352 



5 (AC1), 8 (AC2) 
11 (BC1),4(BG2) 
10(CC1), 14(CC2) 
26 (FL1), 26 (FL2) 



Summary of the mutations present in the two full-length clones constructed as determined by 
5 sequencing. Includes the number of mutations designed, expected and actually present in die 2 clones and the 
number of non-designed mutations in each cassette and full-length clone. 



HJVSavine DNA vaccines and Recombinant Vaccinia viruses 

To test the immunological effectiveness of the HIV Savine constructs the cassette 
sequences were transferred into DNA vaccine and poxvirus vectors. These vectors when 
10 used either separately in immunological assays described below or together in a 'prime- 
boost* protocol which has been shown previously to generate strong T cell responses in 
vivo (Kent etal, 1997). 

DNA Vaccination plasmids were constructed by excising the cassettes from the 
selected plasmid clones with XbaVXhol (cassette A) or XbaVSall (cassettes B and Q and 

1 5 ligating them into pDNAVacc cut with XbaVXhol to create pDVACl f pDVAC2, pDVBCl , 
pDVBC2, pDVCCl, pDVCC2, respectively (Figure 18a). These plasmids were then 
further modified by cloning into their Xbal site a DNA fragment excised using XbaVAvrU 
from pTUMERA2 and encoding a synthetic endoplasmic reticulum (ER) signal sequence 
from the Adenovirus El A protein (Persson et al. f 1980) (Figure 18a). ER signal sequences 

20 have been shown previously to enhance the presentation of both CTL and T helper 
epitopes in vivo (Ishioka, G.Y., 1999; Thomson et al, 1998). The plasmids pDVERACl, 
pDVERBCl, pDVERCCl andpDVERAC2, pDVERBC2, pDVERCC2 were then mixed 
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together to create, plasmid pool 1 and pool 2 respectively. Each plasmid pool collectively 
encodes one copy of the designed full-length HIV Savine. 

Plasmids to generate recombinant Vaccinia viruses which express HIV Savine 
sequences were constructed by excising the various HIV Savine cassettes from the selected 
5 plasmid clones using BamKUXhol (cassette A) or BamHUSaFl (cassettes B and C) and 
cloned into the marker rescue plasmid, pTK7.5, cleaved with BamWSalL These pTK7.5- 
derived plasmids were then used to generate recombinant Vaccinia viruses by marker 
rescue recombination using established protocols (Boyle et a/., 1985) to generate W-AC1, 
W-AC2, W-BC1, W-BC2, W-CC1 and W-CC2 (Figure 18b). 

10 Two further DNA vaccine plasmids were constructed each encoding a version of 

the full length HIV Savine (Figure 18c). Briefly, the two versions of cassette B were 
excised with Xhol and cloned into the corresponding selected plasmid clones containing 
cassette A sequences that were cut with XhoVSall to generate pBSABl and pBSAB2 
respectively. The joined A/B cassettes in pBSABl and pBSAB2 were excised with 

15 XbaVXhol and cloned into pDVCCl and pDVCC2, respectively, and cleaved with 
XbaVXhol to generate pDVFLl and pDVFL2. These were then further modified to contain 
an ER signal sequence using the same cloning strategy as outlined in figure 1 8a. 

Restimulation of HIV specific lymphocytes from HIV infected patients 

The present inventors examined the capacity of the HTV Savine to restimulate 

20 HTV-specific polyclonal CTL responses from HIV-infected patients. PBMCs from three 
different patients were restimulated in vitro with two HIV Savine Vaccinia virus pools 
(Pool 1 included W-AC1 andW-BCl; Pool 2 included W-AC2, W-BC2 and W-CC2) 
then used in CTL lysis assays against LCLs infected eitta&grith one of the Savine Vaccinia 
virus pools or Vaccinia viruses which express gag, env or pol. Figure 19 clearly shows, 

25 that in all three assays, both HIV Savine viral pools restimulated HTV-specific CTL 
responses which could recognise targets expressing whole natural HTV antigens and not 
targets which were uninfected or infected with the control Vaccinia virus. Furthermore, in 
all three cases, both pools restimulated responses that recognised all three natural HIV 
antigens. This result suggests that the combined Savine constructs will provide broader 

30 immunological coverage than single antigen based vaccine approaches. The level of lysis 
in each case of targets infected with Savine viral pools was significantly higher than the 
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lysis recorded for any other infected target. This probably reflects the combined CTL 
responses to gag, pol, and env plus other HIV antigens not analysed here but whose 
sequences are also incorporated into the Savine constructs. 

CTL recognition of each HIV antigen is largely controlled by each patient's HLA 
5 background hence the pattern of CTL lysis for whole HIV antigens is different in each 
patient Interestingly, this CTL lysis pattern did not change when the second Savine 
Vaccinia virus pool was used for CTL restimulation. In these assays, therefore, the 
inventors were unable to demonstrate clear differences between pools 1 and 2, despite pool 
1 lacking a Vaccinia vims expressing cassette CC1 and despite the many amino acid 
10 differences between the A and B cassettes in each pool (see table 1). 

From the foregoing, the present inventors have developed a novel 
vaccine/therapeutic strategy. In one embodiment, pathogen or cancer protein sequences are 
systemically fragmented, reverse translated back into DNA, rearranged randomly then 
joined back together. The designed synthetic DNA sequence is then constructed using long 

15 oligonucleotides and can be transferred into a range of delivery vectors. The vaccine 
vectors used here were DNA vaccine plasmids and recombinant poxvirus vectors which 
have been previously shown to elicit strong T cell responses when used together in a 
'prime-boost' protocol (Kent et al., 1997). An important advantage of scrambled antigen 
vaccines or 'Savines' is that the amount of starting sequence information for the design can 

20 be easily expanded to include the majority of the protein sequences from a pathogen or for 
cancer, thereby providing the maximum possible vaccine or therapy coverage for a given 
population. 

An embodiment of the systematic fragmentation approach described herein was 
based on the size and processing requirements for T cell epitopes and was designed to 
25 cause maximal disruption to the structure and function of protein sequences. This 
fragmentation approach ensures that the maximum possible range of T cell epitopes will be 
present from any incorporated protein sequence without the protein being functional and 
able to compromise vaccine safety 

Another important advantage of Savines is that consensus protein sequences can 
30 be used for their design. Thir feature is only applicable when the design needs to cater for 
pathogen or cancer antigens whose sequence varies considerably. HIV is a highly 
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mutagenic virus, hence this feature was utilised extensively to design a vaccine which has 
the potential to cover not only field isolates of HIV but also the major HIV clades involved 
in the current HIV pandemic. To construct the HIV Savine, one set of long 
oligonucleotides was synthesised, which included degenerate bases in such a way that 8 
5 constructs are theoretically required for the vaccine to contain all combinations in any 
stretch of 9 amino acids. The inventors believe that this approach can be improved for the 
following reasons: 1) While d^enerate bases should be theoretically equally represented, 
in practice some degenerate bases were biased towards one base or the other, leading to a 
lower than expected frequency of the designed mutations in the two full length HTV 

10 Savines which were constructed (see Table 1). 2) Only sequence combinations actually 
present in the HIV clade consensus sequences are required to get full clade coverage, 
hence the number of full length constructs needed could be reduced. To reduce the number 
of constructs however, separate sets of long oligonucleotides would have to be synthesised, 
significantly increasing the cost, time and effort required to generate a vaccine capable of 

1 5 such considerable vaccine coverage. 

A significant problem during the construction of the HTV Savine synthetic DNA 
sequence was the incorporation of non-designed mutations. The most serious types of 
mutations were insertions, deletions or those giving rise to stop codons, all of which 
change the frame of the synthesised sequences and/or caused premature truncation of the 

20 Savine proteins. These types of mutation were removed during construction of the HIV 
Savines by sequencing multiple clones after subcassette and cassette construction and 
selecting functional clones. The major source of these non-designed mutations was in the 
long oligonucleotides used for Savine synthesis, despite their gel purification. This 
problem could be reduced by making the initial subcassettes smaller thereby reducing the 

25 possibility of corrupted oligonucleotides being incorporated into each subcassette clone. 
The second major cause of non-designed mutations was the large number of PCR cycles 
required for the PCR and ligation-mediated joining of the subcassettes. Including extra 
sequencing and clone selection steps during the subcassette joining process should help to 
reduce the frequency of non-designed mutations in future constructs. Finally, another 

30 method that could help reduce the frequency of such mutations at all stages is to use 
resolvase treatment. Resolvases are bacteriophage-encoded endonucleases which recognise 
disruptions to double stranded DNA and are primarily used by bacteriophages to resolve 
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Holliday junctions (Mizuuchi, 1982; Youil et a/., 1995). T7 endonuclease I has already 
been used by the present inventors in synthetic DNA constructions to recognise mutations 
and cleave corrupted dsDNA to allow gel purification of correct sequences. Cleavage of 
corrupted sequences occurs because after a simple denaturing and hybridisation step 
5 mutated DNA hybridises to correct DNA sequences and results in a mispairing of DNA 
bases which is able to be recognised by the resolvase. This method resulted in a 50% 
reduction in the frequency of errors. Further optimisation of this method and the use of a 
thermostable version of this type of enzyme could further reduce the frequency of errors 
during long Savine construction. 

10 Two pools of Vaccinia viruses expressing Savine cassettes were both shown to 

restimulate HTV-specific responses from three different patients infected with B clade HIV 
viruses. These results provide a clear indication that the HTV Savine should provide broad 
coverage of the population because each patient had a different HLA pattern yet both pools 
were able to restimulate HTV-specific CTL responses in all three patients against all three 

15 natural HIV proteins tested. Also, both pools were shown to restimulate virtually identical 
CTL patterns in all three patients. This result was unexpected because some responses 
should have been lost or gained due to the amino acid differences between the two pools 
and because Pool 1 is only capable of expressing 2/3 of the full length HTV Savine. There 
are two suggested reasons why the pattern of CTL lysis was not altered between the two 

20 viral pools. Firstly, the sequences in the Savine constructs are nearly all duplicated because 
the fragment sequences overlap. Hence the loss of a third of the Savine may not have 
excluded sufficient T cell epitopes for differences to be detected in only three patient 
samples against only three HTV proteins. Secondly, while mutations often destroy T cell 
epitopes, if they remain functional, then the CTL they generate frequently can recognise 

25 alternate epitope sequences. Taken together this finding indirectly suggests that combining 
only two Savine constructs may provide robust multiclade coverage. Further experiments 
are being carried out to directly examine the capacity of the HIV Savine to stimulate CTL 
generated by different strains of HIV virus. The capacity of the two HIV-1 Savine 
Vaccinia vector pools to stimulate CD4+ T cell HIV-1 specific responses from infected 

30 patients was also tested (Figure 20). Both patients showed significant proliferation of 
CD4+ T cells although both pools did not show consistent patterns suggesting that the two 
pools may provide wider vaccine coverage than using either pool independently. 
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The present inventors have generated a novel vaccine strategy, which has been 
used to generate what the inventors believe to be the most effective HIV candidate vaccine 
to date. The inventors have used this vaccine to immunise naive mice. Figure 21 shows 
conclusively that the HIV-1 Savine described above can generate a Gag and Nef CTL 
5 response in naive mice. It should be noted, however, that the Nef CTL epitope appeared to 
exist only in Pool 1 since it was not restimulated by Pool 2. This is further proof of the 
utility of combining HIV-1 Savine Pool 1 and Pool 2 components together to provide 
broader vaccine coverage. 

The HIV-1 Savine Vaccinia vectors have also been used to restimulate in vivo 
10 HIV-1 responses in pre-immune M. nemestrina monkeys. These experiments (Figure 22) 
showed, by INF-y ELISPOT and CD69 expression on both CD4 and CD8 T cells, that the 
ability of the HIV-1 SAVINE to restimulate HIV-1 specific responses in vivo is equivalent 
or perhaps better than another HIV-1 candidate vaccine. 

This is a generic strategy able to be applied to many other human infections or 
15 cancers where T-cell responses are considered to be important for protection or recovery. 
With this in mind the inventors have begun constructing Savines for melanoma, cervical 
cancer and Hepatitis C. In the case of melanoma, the majority of the currently identified 
melanoma antigens have been divided into two groups, one containing antigens associated 
with melanoma and one containing differentiation antigens from melanocytes, which are 
20 often unregulated in melanomas. Two Savine constructs are presently being constructed to 
cater for these two groups. The reason for making the distinction is that treatment of 
melanoma might first proceed using the Savine that incorporates fragments of melanoma 
specific antigens only. If this Savine fails to control some metastases then the less specific 
Savine containing the melanocyte-specific antigens can then be used. It is important to 
25 point out that other cancers also express many of the antigens specific to melanomas e.g., 
testicular and breast cancers. Hence the melanoma specific Savine may have therapeutic 
benefits for other cancers. 

A small Savine is also being constructed for cervical cancer. This Savine will 
contain two antigens, E6 and E7, from two strains of human papilloma virus (HPV), HPV- 
30 16 and HPV- 18, directly linked with causing the majority of cervical cancers worldwide. 
There is a large number of sequence differences in these two antigens between the two 
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strains which would normally require two Savines to be constructed. However since this 
Savine is small, the antigen fragments from both strains are being scrambled together. 
While it is normally better for the Savine approach to include all or a majority of the 
antigens from a virus, in this case only E6 and E7 are expressed during viral latency or in 
5 cervical carcinomas. Hence in the interests of simplicity, the rest of the HP V genome will 
not be included although all HPV antigens would be desirable in a Savine against genital 
warts. 

Two Savines have also been constructed for two strains of hepatitis C, a major 
cause of liver disease in the world. Hepatitis C is similar to HIV in the requirements for a 

10 vaccine or therapeutic. However, the major hepatitis C strains share significantly lower 
homology, 69-79%, with one another than do the various HIV clades. To cater for this the 
inventors have decided to construct two separate constructs to cater for the two major 
strains present in Australia, types laand 3a, which together cause approximately 80-95% of 
hepatitis C infections in this country. Both constructs will be approximately the same size 

15 as the HIV Savine but will be blended together into a single vaccine or therapy. 

Overall it is believed that the Savine vaccine strategy is a generic technology 
likely to be applied to a wide range of human diseases. It is also believed that because it is 
not necessary to characterise each antigen, this technology will be actively applied to 
animal vaccines as well where research into vaccines or therapies is often inhibited by the 
20 lack of specific reagents, modest research budgets and poor returns on animal vaccines. 

EXAMPLE! 
Hepatitis C Savine 

Synthetic immunomodulatory molecules have also been designed for treating 
Hepatitis C. In one example, the algorithm of Figure 25 was applied to a consensus 

25 polyprotein sequence of Hepatitis C la to facilitate its segmentation into overlapping 
segments (30 aa segments overlapping by 15 aa), the rearrangement of these segments into 
a scrambled order and the output of Savine nucleic acid and amino acid sequences, as 
shown in Figure 26. Exemplary DNA cassettes (A, B and C) are also shown in Figure 26, 
which contain suitable restriction enzyme sites at their ends to facilitate their joining into a 

30 single expressible open reading frame. 
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EXAMPLE 3 
Melanoma Savine 

The algorithm of Figure 25 was also applied to melanocyte differentiation 
antigens (gplOO, MART, TRP-1, Tyros, Trp-2, MC1R, MUC1F and MUC1R) and to 
5 melanoma specific antigens (BAGE, GAGE-1, gplO0In4, MAGE-1, MAGE-3, PRAME, 
TRP2IN2, NYNSOla, NYNSOlb and LAGE1), as shown in Figure 27, to provide separate 
Savine nucleic acid and amino acid sequences for treating or preventing melanoma. 

EXAMPLE 4 

Resolvase Repair Experiment 

10 A resolvase can be used advantageously to repair errors in polynucleotides. The 

following procedure outlines resolvase repair of a synthetic 340 bp fragment in which 
DNA errors were common. 

Method 

The 340 bp fragment was PGR amplified and gel purified on a 4% agarose gel. 
15 After spin purifying, lOul of the eluate corresponding to approximately 100 ng was 
subjected to the resolvase repair treatment The rest of the DNA sample was stored for later 
cloning as the untreated control. 

2 jiL of lOxPCR buffer, 2 pL of 20 mM MgCl 2 and 6 jiL of MilliQ™ wate r 
(MQW) and Taq DNA polymerase were added to the 10 /jL DNA sample. The mixture 

20 was subjected to the following thermal profile; 95°C for 5min, 65°C for 30min, cooled and 
held at 37°C. Five fiL of 10xT7 endonuclease I buffer, 8 fiL of 1/50 /iL of T7endoI enzyme 
stock and 17 /iL of MQW were added, mixed and incubated for 30 min. Loading buffer 
was added to the sample and the sample was electrophoresed on a 4% agarose gel. A faint 
band corresponding to the full length fragment was excised and subjected to 15 further 

25 cycles of PCR The amplified fragment was agarose gel purified and, along with the 
untreated DNA sample, cloned into pBluescript. Eleven plasmid clones for each DNA 
sample were sequenced and the number and type of errors compared (see table) 
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Buffers were as follows: 

lOx T7endonuclease buffer 

2.5ml 1M TRIS pH7.8, 0.5ml 1M MgCI 2 , 25 /iL 1 M DTT, 50 fiL lOmg/mL BSA, 
2 mL MQW made up to a total of 5 mL. 
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5 T7 endonuclease I stock 

Concentrated sample of enzyme prepared by, and obtained from, JefFBabon (St 
Vincent's Hospital) was diluted 1/50 using the following dilution buffer: 50 yL 1 M TRIS 
pH7.8, O.IjiL 1M EDTA pH8, 5 pL 100 mM glutathione, 50 /iL lOmg/mL BSA, 2.3 mL 
MQW, 2.5 mL glycerol made up to a total of 5 mL. 

10 Results 

The results are summarised in Tables 2 and 3. 



TABLE 2 







A/T to G/C = 6 


A/T to G/C = 1 


G/Cto A/T = 12 


G/CtoA^ = 7 


A/T to deletion = 1 


A/T to deletion = 1 


G/C to deletion = 6 


G/C to deletion = 3 



TABLE 3 





6/1 1 contained deletions 


3/1 1 contained deletions 


9/1 1 contained mutations 


7/1 1 contained mutations 
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Discussion/Conclusion 



While overall the number of correct clones obtained was not significantly 
different, there was a significant difference in the level of error* This reduction in errors 
becomes more significant as greater numbers of long oligonucleotides are joined into the 
5 one construct i.e., increasing the difference between untreated versus treated samples in the 
chance of obtaining a correct clone. It is believed that combining another resolvase such as 
T4 endonuclease VII may further enhance repair or increase the bias against errors. 

Importantly, this experiment was not optimised e.g., by using proofreading PCR 
enzymes or optimised conditions. Finally if the repair reaction is carried out during normal 
10 PCR, for example, by including a thermostable resolvase, it is believed that amplification 
of already damaged long oligonucleotides, and the normal accumulation of PCR induced 
errors, even using error reading polymerases during PCR, could be reduced significantly. 
The repair of damaged long oligonucleotides is particularly important for synthesis of long 
DNA fragment such as in Savines because, while the rate of long oligonucleotide damage 
15 is typically <5%, after joining 10 oligonucleotides, the error rate approaches 50%. This is 
true even using the best proofreading PCR enzymes because these enzymes do not verify 
the sequence integrity using correct oligonucleotide templates that exist as a significant 
majority (95%) in a joining reaction. 

The disclosure of every patent, patent application, and publication cited herein is 
incorporated herein by reference in its entirety. 

The citation of any reference herein should not be construed as an admission that 
such reference is available as 'Trior Art" to the instant application 

Throughout the specification the aim has been to describe the preferred 
embodiments of the invention without limiting the invention to any one embodiment or 
specific collection of features. Those of skill in the art will therefore appreciate that, in 
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light of the instant disclosure, various modifications and changes can be made in the 
particular embodiments exemplified without departing from the scope of the present 
invention. All such modifications and changes are intended to be included within the 
scope of the appended claims. 
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WHAT IS CLAIMED IS: 

1. A synthetic polypeptide comprising a plurality of different segments of at least one 
parent polypeptide, wherein the segments are linked together in a different relationship 
relative to their linkage in the at least one parent polypeptide to impede, abrogate or 
otherwise alter at least one function associated with the parent polypeptide. 

2. The synthetic polypeptide of claim 1, consisting essentially of different segments of a 
single parent polypeptide. 

3. The synthetic polypeptide of claim 1, consisting essentially of different segments of a 
plurality of different parent polypeptides. 

4. The synthetic polypeptide of claim 1, wherein the segments in said synthetic 
polypeptide are linked sequentially in a different order or arrangement relative to their 
linkage in said at least one parent polypeptide. 

5. The synthetic polypeptide of claim 4, wherein the segments in said synthetic 
polypeptide are randomly rearranged relative to their order or arrangement in said at least 
one parent polypeptide. 

6. The synthetic polypeptide of claim 1, wherein the size of an individual segment is at 
least 4 amino acids. 

7. The synthetic polypeptide of claim 6, wherein the size of an individual segment is from 
about 20 to about 60 amino acids. 

8. The synthetic polypeptide of claim 7, wherein the size of an individual segment is 
about 30 amino acids. 

9. The synthetic polypeptide of claim 7, comprising at least 30% of the parent polypeptide 
sequence. 

10. The synthetic polypeptide of claim 1, wherein at least one of said segments comprises 
partial sequence identity or homology to one or more other said segments. 

11. The synthetic polypeptide of claim 10, wherein the sequence identity or homology is 
contained at one or both ends of an individual segment 
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12. The synthetic polypeptide of claim 11, wherein one or both ends of said segment 
comprises at least 4 contiguous amino acids that are identical to, or homologous with, an 
amino acid sequence contained within one or more other of said segments. 

13. The synthetic polypeptide of claim 10, wherein the size of an individual segment is 
about twice the size of the sequence that is identical or homologous to the or each other 
said segment. 

14. The synthetic polypeptide of claim 13, wherein the size of an individual segment is 
about 30 amino acids and the size of the sequence that is identical or homologous to the or 
each other said segment is about 15 amino acids. 

15. The synthetic polypeptide of claim 1, wherein an optional spacer is interposed between 
some or all of the segments. 

16. The synthetic polypeptide of claim 15, wherein the spacer alters proteolytic processing 
and/or presentation of adjacent segments). 

17. The synthetic polypeptide of claim 16, wherein the spacer comprises at least one 
neutral amino acid. 

18. The synthetic polypeptide of claim 16, wherein the spacer comprises at least one 
alanine residue. 

19. The synthetic polypeptide of claim 1, wherein the at least one parent polypeptide is 
associated with a disease or condition. 

20. The synthetic polypeptide of claim 1, wherein the at least one parent polypeptide is 
selected from a polypeptide of a pathogenic organism, a cancer-associated polypeptide, an 
autoimmune disease-associated polypeptide, an allergy-associated polypeptide or a variant 
or derivative of these. 

21. The synthetic polypeptide of claim 1, wherein the at least one parent polypeptide is a 
polypeptide of a vims. 

22. The synthetic polypeptide of claim 21, wherein the virus is selected from a Human 
Immunodeficiency Virus (HIV) or a Hepatitis virus. 

23. The synthetic polypeptide of claim 22, wherein the virus is a Human 
Immunodeficiency Virus (HIV) and the at least one parent polypeptide is selected from 
env, gag, pol, vif, vpr, tat, rev, vpu and nef, or a combination thereof. 
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24. The synthetic polypeptide of claim 1, wherein the at least one parent polypeptide is a 
cancer-associated polypeptide. 

25. The synthetic polypeptide of claim 24, wherein the cancer is melanoma. 

26. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanocyte differentiation antigen. 

27. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanocyte differentiation antigen selected from gplOO, MART, TRP-1, Tyros, TRP2, 
MC1R, MUC1F, MUC1R or a combination thereof. 

28. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanoma-specific antigen. 

29. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanoma-specific antigen selected from BAGE, GAGE-1, gpl00In4, MAGE-1, MAGE- 
3, PRAME, TRP2IN2, NYNSOla, NYNSOlb, LAGE1 or a combination thereof. 

30. A synthetic polynucleotide encoding a synthetic polypeptide comprising a plurality of 
different segments of at least one parent polypeptide, wherein the segments are linked 
together in a different relationship relative to their linkage in the at least one parent 
polypeptide to impede, abrogate or otherwise alter at least one function associated with the 
parent polypeptide. 

31. A method for producing the synthetic polynucleotide encoding a synthetic polypeptide 
comprising a plurality of different segments of at least one parent polypeptide, wherein the 
segments are linked together in a different relationship relative to their linkage in the at 
least one parent polypeptide to impede, abrogate or otherwise alter at least one function 
associated with the parent polypeptide, said method comprising: 

- linking together in the same reading frame a plurality of nucleic acid sequences 
encoding different segments of the at least one parent polypeptide to form a synthetic 
polynucleotide whose sequence encodes said segments linked together in a different 
relationship relative to their linkage in the at least one parent polypeptide. 

32. The method of claim 31, further comprising fragmenting the sequence of a respective 
parent polypeptide into fragments and linking said fragments together in a different 
relationship relative to their linkage in a respective parent polypeptide sequence. 
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33. The method of claim 32, wherein the fragments are randomly linked together. 

34. The method of claim 31, further comprising reverse translating the sequence of a 
respective parent polypeptide or a segment thereof to provide a nucleic acid sequence 
encoding said parent polypeptide or said segment. 

35. The method of claim 34, wherein an amino acid of a respective parent polypeptide 
sequence is reverse translated to provide a codon, which has higher translational efficiency 
than other synonymous codons in a cell of interest. 

36. The method of claim 35, wherein an amino acid of said parent polypeptide sequence is 
reverse translated to provide a codon which, in the context of adjacent or local sequence 
elements, has a lower propensity of forming an undesirable sequence that is refractory to 
the execution of a task. 

37. The method of claim 35, wherein an amino acid of said parent polypeptide sequence is 
reverse translated to provide a codon which, in the context of adjacent or local sequence 
elements, has a lower propensity of forming an undesirable sequence selected from a 
palindromic sequence or a duplicated sequence, which is refractory to the execution of a 
task selected from cloning or sequencing. 

38. The method of claim 31, further comprising linking a spacer oligonucleotide encoding 
at least one spacer residue between segment-encoding nucleic acids. 

39. The method of claim 38, wherein spacer oligonucleotide encodes 2 to 3 spacer 
residues. 

40. The method of claim 38 or claim 39, wherein the spacer residue is a neutral amino acid. 

41 . The method of claim 38 or claim 39, wherein the spacer residue is alanine. 

42. The method of claim 31, further comprising linking in the same reading frame as other 
segment-containing nucleic acid sequences at least one variant nucleic acid sequence 
which encodes a variant segment having a homologous but not identical amino acid 
sequence relative to other encoded segments. 
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43. The method of claim 42, wherein the variant segment comprises conserved and/or non- 
conserved amino acid differences relative to one or more other encoded segments. 

44. The method of claim 43, wherein the differences correspond to sequence 
polymorphisms. 

45. The method of claim 44, wherein degenerate bases are designed or built in to the at 
least one variant nucleic acid sequence to give rise to all desired homologous sequences. 

46. The method of claim 31, further comprising optimising the codon composition of the 
synthetic polynucleotide such that it is translated efficiently by a host cell. 

47. A synthetic construct comprising a synthetic polynucleotide encoding a synthetic 
polypeptide comprising a plurality of different segments of at least one parent polypeptide, 
wherein the segments are linked together in a different relationship relative to their linkage 
in the at least one parent polypeptide to impede, abrogate or otherwise alter at least one 
function associated with the parent polypeptide, wherein said synthetic polynucleotide is 
operably linked to a regulatory polynucleotide. 

48. The synthetic construct of claim 47, further including a nucleic acid sequence encoding 
an immunostimulatory molecule. 

49. The synthetic construct of claim 48, wherein the immunostimulatory molecule 
comprises a domain of an invasin protein (Inv). 

50. The synthetic construct of claim 48, wherein the immunostimulatory molecule 
comprises the sequence set forth in SEQ ID NO: 1467 or an immune stimulatory 
homologue thereof. 

51. The synthetic construct of claim 48, wherein the immunostimulatory molecule is a T 
cell co-stimulatory molecule. 

52. The synthetic construct of claim 48, wherein the immunostimulatory molecule is a T 
cell co-stimulatory molecule selected from a B7 molecule or an ICAM molecule. 

53. The synthetic construct of claim 48, wherein the immunostimulatory molecule is a B7 
molecule or a biologically active fragment thereof, or a variant or derivative of these. 
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54. The synthetic construct of claim 48, wherein the immunostimulatory molecule is a 
cytokine selected from an interleukin, a lymphokine, tumour necrosis factor or an 
interferon. 

55. The synthetic construct of claim 48, wherein the immunostimulatory molecule is an 
immunomodulatory oligonucleotide. 

56. An immunopotentiating composition, comprising an immunopotentiating agent 
selected from the synthetic polypeptide of claim 1, the synthetic polynucleotide of claim 30 
or the synthetic construct of claim 47, together with a pharmaceutically acceptable carrier. 

57. The composition of claim 56, further comprising an adjuvant. 

58. A method for modulating an immune response, which response is preferably directed 
against a pathogen or a cancer, comprising administering to a patient in need of such 
treatment an effective amount of an immunopotentiating agent selected from the synthetic 
polypeptide of claim 1, the synthetic polynucleotide of claim 30, the synthetic construct of 
claim 47, or the composition of claim 56. 

59. A method for treatment and/or prophylaxis of a disease or condition, comprising 
administering to a patient in need of such treatment an effective amount of an 
immunopotentiating agent selected from selected from the synthetic polypeptide of claim 
1, the synthetic polynucleotide of claim 30, the synthetic construct of claim 47, or the 
composition of claim 56. 

60. A computer program product for designing the sequence of a synthetic polypeptide 
comprising a plurality of different segments of at least one parent polypeptide, wherein the 
segments are linked together in a different relationship relative to their linkage in the at 
least one parent polypeptide to impede, abrogate or otherwise alter at least one function 
associated with the parent polypeptide, said program product comprising: 

- code that receives as input the sequence of said at least one parent polypeptide; 

- code that fragments the sequence of a respective parent polypeptide into 
fragments; 

- code that links together said fragments in a different relationship relative to their 
linkage in said parent polypeptide sequence; and 
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- a computer readable medium that stores the codes. 

61. The computer program product of claim 60, further comprising code that randomly 
rearranges said fragments. 

62. The computer program product of claim 60, further comprising code that links the 
sequence of a spacer residue to the sequence of said at least one parent polypeptide or to 
said fragments. 

63. A computer program product for designing the sequence of a synthetic polynucleotide 
encoding a synthetic polypeptide comprising a plurality of different segments of at least 
one parent polypeptide, wherein the segments are linked together in a different relationship 
relative to their linkage in the at least one parent polypeptide to impede, abrogate or 
otherwise alter at least one function associated with the parent polypeptide, comprising: 

- code that receives as input the sequence of at least one parent polypeptide; 

- code that fragments the sequence of a respective parent polypeptide into 
fragments; 

- code that reverse translates the sequence of a respective fragment to provide a 
nucleic acid sequence encoding said fragment; 

- code that links together in the same reading frame each said nucleic acid 
sequence to provide a polynucleotide sequence that codes for a polypeptide sequence in 
which said fragments are linked together in a different relationship relative to their 
linkage in the at least one parent polypeptide sequence; and 

- a computer readable medium that stores the codes. 

64. The computer program product of claim 63, further comprising code that randomly 
rearranges said nucleic acid sequences. 

65. The computer program product of claim 64, further comprising code that reverse 
translates an amino acid of a respective parent polypeptide sequence to provide a codon, 
which has higher translational efficiency than other synonymous codons in a cell of 
interest 

66. The computer program product of claim 63, further comprising code that reverse 
translates an amino acid of a respective parent polypeptide sequence to provide a codon 
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which, in the context of adjacent or local sequence elements, has a lower propensity of 
forming an undesirable sequence that is refractory to the execution of a task. 

67. The computer program product of claim 63, further comprising code that links a spacer 
oligonucleotide to one or more of said nucleic acid sequences. 

68. A computer for designing the sequence of a synthetic polypeptide comprising a 
plurality of different segments of at least one parent polypeptide, wherein the segments arc 
linked together in a different ~?!ationship relative to their linkage in the at least one parent 
polypeptide to impede, abrogate or otherwise alter at least one function associated with the 
parent polypeptide, wherein said computer comprises: 

(a) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said machine-readable data comprise the 
sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

(c) a central-processing unit coupled to said working memory and to said machine- 
readable data storage medium, for processing said machine readable data to provide said 
synthetic polypeptide, sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polypeptide sequence. 

69. The computer of claim 68, wherein the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments and 
linking together said fragments in a different relationship relative to their linkage in the 
sequence of said parent polypeptide. 

70. The computer of claim 68, wherein the processing of said machine readable data 
comprises randomly rearranging said fragments. 

71. The computer of claim 68, wherein the processing of said machine readable data 
comprises linking the sequence of a spacer residue to the sequence of said at least one 
parent polypeptide or to said fragments. 
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72. A computer for designing the sequence of a synthetic polynucleotide encoding a 
synthetic polypeptide comprising a plurality of different segments of at least one parent 
polypeptide, wherein the segments are linked together in a different relationship relative to 
their linkage in the at least one parent polypeptide to impede, abrogate or otherwise alter at 
least one function associated with the parent polypeptide, wherein said computer 
comprises: 

(a) a machine-readable data storage medium comprising a data storage mater* 1 
encoded with machine-readable data, wherein said machine-readable data comprise the 
sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

(c) a central-processing unit coupled to said working memory and to said machine- 
readable data storage medium, for processing said machine readable data to provide said 
synthetic polynucleotide sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polynucleotide sequence. 

73. The computer of claim 72, wherein the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments, 
reverse translating the sequence of a respective fragment to provide a nucleic acid 
sequence encoding said fragment and linking together in the same reading frame each said 
nucleic acid sequence to provide a polynucleotide sequence that codes for a polypeptide 
sequence in which said fragments are linked together in a different relationship relative to 
their linkage in the at least one parent polypeptide sequence. 

74. The computer of claim 72, wherein the processing of said machine readable data 
comprises randomly rearranging said nucleic acid sequences. 

75. The computer of claim 72, wherein the processing of said machine readable data 
comprises reverse translating an amino acid of a respective parent polypeptide sequence to 
provide a codon, which has higher translational efficiency than other synonymous codons 
in a cell of interest. 
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76. The computer of claim 72, wherein the processing of said machine readable data 
comprises reverse translating an amino acid of a respective parent polypeptid sequence to 
provide a codon which, in the context of adjacent or local sequence elements, has a lower 
propensity of forming an undesirable sequence that is refractory to the execution of a task. 

77. The computer of claim 72, wherein the processing of said machine readable data 
comprises linking a spacer oligonucleotide to one or more of said nucleic acid sequences. 
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Full length -17000 bp 



T 



Xbal*BamHI Sall/Xhol destroyed Xhol intact BglllEcoRISall* 

^ Cassette A -5600 bp Cassette B ~5600bp Cassette C ~5800bp 

Xbal'BamHI Sail '( |' Xhol ] \^ho\~ '| (' Xhol BglllEcoRISaL 

BglllEcoRIXhor Xbal'BamHI tfglllEcoRISair Xbal*BamHI * 



Full length construction after cloning the cassettes into pBS- 
Sites marked with a are in the pBS MCS 



Cassette Extras (Can be removed from cassette ends) 



A (37bp) BamHI/Kozak Start 

5* gc ggatccacc atg 

B (43bp) BamHI/Kozak Start Xhol 

5* gc ggatccacc atg ctcgag.. 

C (37bp) BamHI/Kozak Start Xhol 
5* gc ggatccacc atg ctcgag.. 



Sail Stop Bglll EcoRI 
....gtcgac tga agatct gaattc gc3' 

Xhol Stop Bglll EcoRI 
...ctcgag tga agatgt gaattc gc3' 
Stop Bglll EcoRI 
tga agatct gaattc gc3* 



FIGURE 14 
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Cassette Construction 



I- 



Full Length 5687bp 



A1-A4 3330bp 



Subcassettes 

I A1/A21670bp 



A5-A8 2500bp 



XbaL Spel 



A3/A4 1670bp 



A5/A6 1670bp , . , A7 840bp , 



Nhel Spel I Xbal Spel 

Spel/Xbal Avrll/Nhel Avrli/Nhel 



H f 



Xbal Nhel 



Subc asset to Extras (Can be removed 
SCI (A 28bp, B/C 34bp) 

As for 5* of Cassettes 
SC2 (28bp) BamHI Xbal 

5* gc ggatcc tctaga 

SC3 128bp) BamHI Spel 

5 \ gc ggatcc actagt 

SC4 (28bp) BamHI Nhel 

5 ■ gc ggatcc gctagc 

SC5 (28bp) BamHI Spel 

5 ' gc ggatcc actagt ....... 

SC6 <28bp) BamHI Nhel 

5' gc ggatcc gctagc 

For Cassettes A and B only 
SC7 (37bp) BamHI Nhel 

5 ' gc ggatcc gctagc. . . 1 . . 
For Cassette C only 
SC7 <28bp) BamHI Nhel 

5* gc ggatcc gctagc 

SC8 <31bp) BamHI Xbal 

5* gc ggatcc tctaga 



from cassette ends) 

Spel 
actagt 

Nhel 
gctagc 

Avrll 
. . I cctagg 

Xbal 
tctaga 

Avrll ' 
ccatgg 

Xbal 
tctaga 



EcoRl 

gaattc gc 3 ■ 
EcoRI 

gaattc gc 3 ■• 
EcoRI 

gaattc gc B * 
EcoRI 

gaattc gc 3 " 
EcoRI 

gaattc gc 3 • 
ECORI 

gaattc gc 3 * 



As for 3 • of Cassettes A/B 

Spel EcoRI 
actagt gaattc gc 3' 



As for 3 ' of Cassette C 



FIGURE 14 (Cont) 
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SUBSTITUTE SHEET (RULE 26) 
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Kozac 

Bam T I iff 1 3 * " env 185-214(149) 70 «° 

GGdCGATtJCAC^TC^ 

1 1 , m'tgpcxwvsxvqcthgixpvvst> 
90 ioo no 120 no gag 76-105 (6) "° 

CCAACTCCTCCIt^TGOCTCCCTQAIUAGCCTCTWCAATACCHT^ 
CCTTCXCGACCACTrACCCACCCACfTYTTCG^ 

0 lllmgsl'x s lxntxatlwcvhq r X x> 
170 180 "0 200 210 220 po | 31-60(36) 

TCJWGGACACAAATC 

VXDTKEALDKI B *L GDGGGAXRQGTSS S> 

250 260 270 280 290 300 310 320 

* * * • . • * * * 

YTCARCTTTCCACAAATCACACTGTOGCAAAGOCCTCTGGTC^ 
IWSTYCAAACCTGTTXACTGTCACACCCTTTCCGGA^ 
X X P P Q I TL W Q R P LVt'bPPRXXN P X MV I> 

pol 316-345 (55) 350 360 380 390 400 

% ' # « ♦ » » * 

TTACCACTACATGGACGATCTGTATGTOSGA^ 
AATGGTCATGTACCTGCTAGACATACACCCTTC 

YQYMDDLYVQSDLBICO H f P T T P D K K H> 

410 DOf 361-390 (58) 440 450 460 470 480 

AAAAGGAACCACt^TTCtrrCTGGATGGGAT^ 
TTTTCCTTGGTGGTAAGGAGACCTACCCTATGCTTGACGTAG 

O KBPP PL W M C Y B L H P DRWTVQP'X X PPQ> 

4»o 500 pol 46-75 (37) 530 

ATTACCCTCTGGCAGCGTCCCCTCGTGACARTCAAAATCQGG 

TAATCGCAGACCCTOOCACQGGAGCACTCTY AGTTTTAGCCGCCTGTCGAOTWTCTCCGAGACGAGCTGTCTCC 
I TLVQRPLVTX K ICCQLXBA LL DT C 1 S X> 

570 580 590 tat 46-75 f1 21) 620 630 640 

TGGCAGAAAGAAACGTAGGCAACGTAGASGCGCTCCTCAGAGCAGHR 

ACCO'll'ITH rrjrtCATCCGTTGCATCTSCGCGAGCAGTCTCGTCK y'rLt^AGTGGTTATGGGATAGRGACTC GT1*GCA»U 
GRKXRRORRXAP0SXXDHQYPIXEQP> 

650 660 670 680 DOl 1-30 (34) 710 7 *° 

* # * * r \ / # # 

JT CGC TTTCCXGCAAGQTTtAAGCCAgAGAOTTTYCCAGCGAA^ 
ATCt:CTrrfCCACCCAAAGGXCGTTeCAYTTCGg 
L X'PPRBM LA FXQGXAREPXS EQTXAMS> 

730 740 750 re v 1 06-122 (131) ^spacers ooo 





YCCRCCTCCACGAAdAGCCCCCAAATCTCCCCCGAAACCTC CCCl 
RGCYG^AGGrCCTTlJlYJGGGGGTTrAGAGGCCG 
X X S R x's PQISGBSSXXLGXCTXN 

810 820 830 gag 91-120 (7) 860 870 

||jAGAATCCAWG7GARAGATACCAA 
gTCTTACCTWCACTYTCTATGGTTTCTCX 
» R IXVXDTKEAL OKI B £XQXXSXQKTQ> 

890 900 910 920 DOl 601-630 (74) 950 960 

TCCGACCCCGAfcTTCCCCCTATG^ 

qaaa'kacyvtdrgrqkxxsltxttnqio 
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970 9bo 990 iooo ioio en v 46-75 (140) 1040 



ACCQAACTCCAWCCCATTCAqaUKXX 

CAAAACGCGG7CGCTACG(nTTCGGRTACTSTGTCTCCA0GT 
CAS DAKAXXTEV H> 



TCGCTTGACGWCGGTAAGTJCT1CCGCYKATC 
TELXAI X XAXTTLP 



1050 1060 1070 108O 1090 po| 76-105 (39) 1120 

• ft * * * r * 7 ft 

CAATGTGTGGGCCACACACGCTttXTCTCCCXK 
GTTACACACCLtXiTGTC'JtXrCAACGCACCcdcGACTCXrT^ 

N'VMAT H ACV P A DD T VLE X X M L P G X W K> 

1130 1140 1150 1160 1170 1180 1190 1200 

* * • * * * • * 

CTAAGATGATTGGCGGAATCGGCGGATTCATTA^ 
GArTCTACTAACCGCCTTAGCCGCCTMGTAAT^ 

pkm iggicgfikvr 1 * IGPBNPYNTPXF> 
pol 196-225 (47) 1330 1340 1250 1360 1270 1280 

GCTATCAAGAAAAAGGACTCCACCAAATGGAGAAAGCTC 

ccATAivrivxrrTracTGAGGiu^rriAce 

A I KKK DSTKW R KLVDPr'xR I I X I I* Y Q S> 

1290 rev 16-45 (125) 1320 1330 1340 1350 1360 



CAATCCCTATCCTAGCTCCGAAGGCWCCAGGCAARCCAGAARGAA 
GTrAGCCATRGGATCCAGCCin<:CG«C ^ I\XGlI^ 

N P Y PSS EGXRQXR XNRR RRW'G G BXXR> 



1370 1380 env 525-554 (171) 1410 1420 1430 1440 

ATAGGTCCCTCACACTQGTCARCCCATTCTTACCC 
TATCCAGCCACTCTGACCAGTYGCCTAACARTCGGGAGC^ 

ORS VR L V XGF X A LA W DDL RX L C L F X N L> 

1450 1460 ' 1470 env 31-60 (139) 150 °, 1S1 ? 1S2 ° 

TGGGTCACCCTCTACT A TGGCGTCCCCOTtrTGGAGA 
ACCCAGTOCCACATOATACCGCAOCtXXatGACCTC^ 
WVTVYYCVPVWRXA XTTLFCASDA K A X> 

spacers isso isso rev 1-30 (124) 1590 1600 



C CCTGCC \TGGCTGGCAGAACCGCCRRCACACaCGAAGA^ 
qCCACGC rAXXXSACCCTCTTCGCCCYYGTGTCTGCTTXT^ 

AAMAGRSGXTDEELLXAXRIIXILYQ> 



1610 1620 1630 1640 1650 vif 16-45 (101) 1680 



♦ 



A2 



* 

CCAAC CX^I^ ACCCTTCC^^^^^STGARAATCAGAAC CTGG AAS A GCCTGGTCAACC ATC ACATC Y ACATCTCCAAGAAA 
G<nTGGGAATGGGAAC< ^^g^ ACTYTTAGTCTTGG^ JOIFl 
SNPYPSA5MXIRTHXSLVXHHMXISXK> A3 



1690 1700 1710 1720 1730 1740 1750 1760 

* * • . • * * * * 

GCCAAWGCCTCCTTCTATACCCATCACWICA^fc^ 
CGGTO^CCArCAAGATATCCtrTACTGAWACTsbn^^ 

axcwpyrhhxx'bsexvxqiibxli kke> 



pol 661-690(78) 



1790 1800 1110 1820 1830 1840 



aarggtctacctakcatgggtaccaccccac aagggaatcgcflcaaacc yc aaaatcc 

ttyccagatccathgtacccattxtcc^ctgticccttagc ctj g tjtgct j ftc t c gagglvri'xl/l'ctaakrgttttagg 
xvylxwvpahkgig'qtk elqxqixki> 



i«so pol 916-945 (95) 188 J 1898 1908 1910 , 1928 

AAAACTTTAGXX^CTACTATAG<iGATAGCAG AAAAGC YTTGAGGAAATCTGGRAC AAT 

TTTTGAAATCCCAGATGATATCCCTATCGrtrrCTCCGAXAGACCW 

QNPRVYYR05R DPXWKGP'XSXEE I W X »> 
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1930 env 405-434 (163) i960 1970 1980 1990 2000 

• * • • . • * 

ATGACATCGATKSAffTGGGAGAGAGAG^ 
TACTCTACCTArtSTCACCCTCTCTCTCT 
HTWXXWBREISNYTXXIYXIl'xPEPTA> 

2010 2020 gag 451-480 (31) 2050 2060 2070 2080 

* • * ' * » ft * 

CCCTCCCGCTGAGARTTTCRGATTCGGTGM^ 

GQCATOGOCACTCTYAAAGYCTAACCCACT CCT TT GA TGTGGGAGGQ OT 1 r CCT It. II CKTTTCCTATTCCTQGTTATGC 
P PAEXPXPG E ETT PSX K Q E X K DKE Q Y> 

2090 2100 ?no pol 106-135(41) 2 no 2150 2160 

* ♦ * r x 9 • • " * 

ATCAGATTHlTATTGAGATTTGCCXXAACAAAXXrTA 
TACTCTAAXAATAACTCTAAACCCCCTTCTTTXZCATAACCATCTC 

DQIXIEICCKXAICTVLVCPTPVNI IC> 

2170 2180 2190 2200 vpr 46-75 (1 1 5) 2230 * 2240 

AGjIaTTTACGAAACCTATOGCCATACCTGGGAGGGCGTCGAG^ 
TCTpAAATbCTX-It^TACCGCTATGGACCCTCC<^ 
R X • Y B T Y Q O T W E Q VBA L I R X I* Q Q L X PXH> 

2250 2260 2270 2280 2290 tat31-61 f120> 2320 

• • # • * 

TrrrCAGAATCCGMrCTTWTCATTOCCAA 
AAAgTCTTAGCCTfeAAWACTAAeGCTTSACA 

FBI g'c XBCQXCFLTKCLGI5XGRKKR> 



2330 



2340 spacers 2370 2300 tat 1-30 (118) 



RACACAGAAGGSGAGCTCCCCAJ GCIGCC KTGGACCCCGTGGACCCCAA5CTGGAGCClTGGAAVfCACCCTG<»CTCCCAG 

YTCTCTCTTCCSCTCCACCCCT] CGACCC rACCTCGCGCACCrCGCCTTSGACCTCCCAACCTTVK^ 

X Q R RXA P 0 A A M OPVO PX X.B PWXH P C S Q> 



t 



2410 2420 2430 2440 2450 2460 2470 2480 

* » * w » • * * 

CXTTAHGACAGCCTGT*IHCAAATGC7JATTGCAAAAAGTG<^^^Sj^ GAAGAGAC AACCCCTAGCCMGAAACAGGAACMGAA A3 
CCATKCTGTCGCACAtnCGrrTrACGATAAC i,yriM " fC AC ^^^^ CTfC 'JXL'TCi'1'l' OC GGATCGGK C IT *G ULTi ' C KCTT | 0 f n 
PXTACXKCYCKXCPSBETTPSXKOEXK> 3 ^ 

gag 466495 (32) 2510 2520 2W J 25 *« 25S0 . 25 «« 1 

ACACAAAGAACWCTACCCCCC TT yA6CCAGCCTC74ACTCCCTGTTTCOCAATGA<k * 
T CTG TTT C TT G MBICTGGG6fiGAART05CTCC^^ 

DKEXYFFXASLK5LPGN D 1 N P H H If X N X> 



2570 env 91-120(143) 2600 2 *i° 3620 2530 



2640 



TGGTGGAJICAGATGCAMGAAGACRTTATCTCACT^ 
ACCACCTKGTCTACGTKC'ITCTGYAATACA 

MVXQWXBDXI SLWDQSLKPCVK LDVGD> 



26S0 2660 pol 256-285 (51) 2690 2700 2710 2720 

• ft * * * ft • - ft 

GCCTATTTX7XCGTGCCTCTCGOTRAARRCTTCAGAAAGTATACC<XTPPC ACAATCCCTAGCAYAAACAATGAOCAACT 
CQGATAAAGACCCACGGAGACCTAYTmGAAGtCrrTCATATGCCCA^ 
AYF5VPLDXX PRKYTAPTI PSXNM E Q L> 

2730 2740 2750 pol 751-780 (84) 2788 279 J 2808 

GAAAGGCGAAGCC ATSCATCGCC AAGTGRATTCCTC ACC AGGC^^ R 
CTTTCCGCTTCGGTASGTACCGGTTCACYTAACGAGTGGTCCCT 

KGEAXHGQVXCSPGIWQ LDCTHLBGX> 

2810 2820 2830 2840 DOl 166-195 (45) 2870 2880 

TTATOCCTAAGGTCAAGC AATGGCCTCTG AC^GAGGAAAAGAOT AMAGACATGCAGVAA 
AATAQGGATTCCAOTTC GTT ACCGGACACTGTCTCXTTT C KCTAAAC GTKTCTCTACCTCBTT 

Xl'PKVXQWPLTEEKIKACTXlCXBMEX> 
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2890 2900 2910 « 01 ocn 2940 2950 2960 



pol 331-360 (56) 



GAGGGAAAGATTAGtlATGGATGACCTCTACGTC 
CTCCCTTTCTAATCOTACCTACrrcCAGATGCAGCCGA^ 
EG RIS HDDLTVGSOLEI GQHRXK I E E I*> 



2970 2980 2990 ,000 ^5,^45 ^ 3»30 3040 

CAGGSMACACCTCCTGARATGGGG/fcTCACC 
GTCCSKTGTCCACCACTYTACCCXry^G^ 

R X H L L X W G ( L TXTTWQRTELXAIXLAL> 

3050 3060 3070 3080 3090 ^ . - ftC QOC /c _ 3120 

♦ pol 796-825 (87) 

AAOAC TC CCOCTrACAGCTCAACATTCTCACAC^ 
TTCTGACGCCGARTCTCCAGTTCTAACACTCTCTOT 

Q DSGXBVNIVTO 1 ! PAETCQBTAYPXL IC> 



3130 3140 1150 3150 3170 3180 3190 3200 

* * * * . • • • • 

CTGWIGGCAGATGCCCTCTCARAR YCATTCA^ SMGCATCTGCT 

AACTCC 1 rtSACTCTSKCGTAGACGA 
E B L R X H L h> 



gaccgaccctctaccccacact t't yrctaactctctctxstraccxrl'cctgtttct a 
lagrwpvxx^htdhg'rtk-x 



pol 346-375 (57) 3230 3240 3250 3260 3270 



3280 



CAJ^TCCCCCTTCACAACCCCTGACAAAAACCATCW 
CTYTACCCCGAACTGTTGGGGJUTICTTrT^ 

XWGFTTPDKKHQK EPPPI»SSVKKLT E> 

3290 vif 166-192(111) " 20 3330 ^ spacers 3360 



ATARCTCGAACRAACCCCAGAAAAYCAAGGGACRCACACRAAATCACACAATGAAT^ CCTGCC HCAGAGTCCCAG 

TATYCAC c r^ ^r ri ^»^ x ^ T LTrln ^ ^^ cgaccg rcrcTCACccTC 

DXWNXPOKXXGXRXHHTHNGKAATBSO> 



3370 3380 env 435-464 (165) 3410 



3420 3430 3440 



AATCACCAAGACAGAAACCAAMACGAKCTCC TCGMGCTCt^CAAATC^XSCAAGCCTCTGGAATTGGT TT RACATT ASCCA 

CCCGTTCGGAGACCTTAACCAAAYTGTAATSGpT 

I X 1 D: 




D> 



gag 121-150 (9) 35 °? 



3520 



C ACOCGAA RTACCTCCMAAGTGTCCCACAAyrACCCTATCCTCCAGAAT 
GTCGCCTrYATCGAGCKTTCACAGGGTCTTAATGGGATACCAGGTCn 

TGXSSXV5QNTPIV0MX QGQMVHQX X> 

3530 35*o 3550 3560 env 480-509 (168) 3590 3600 

cccccagJctcrtcggactgagaatcrttttcgct^ 
gggggtcbgagyagcctgac7xttagyaaaagcgacacgagtcct 

S Pr'lXGLR IXFAVLSIXWRVRQGY3PL> 

3610 3620 3630 3640 3650 Vlf 106-135 (107 3680 

yAWTACTTlt^CTGTrrCXCTGACTCCRCCATTAGGAGAGCCATCCTOGG 
AGACRTWATGAAACTGACAAAGMGACTGAjGGYGGTAATCCTCTCGGTAGGACCC 
QTLX , 1.IHLX*FDCPXDSXIRRAILC> 



TOC TTCC AAACCCTCMYcjcTCATCCATCTGY AWTACTTT 
AGGAAGC'irTGGGAGKRQGAGTAGGTAGACRTWATGAAA 
SFQTLX'l.IHLXyP 



3690 3700 3710 3720 3730 3740 3750 3760 

******** 

ACASAKAGTGAGMACGAGATGCGAAT/AOGCTCfrCGGA^ 

TGT STMTC ACTC KTC CTCT ACGCTTATaCGAC ACC CTKAGCCTCGGTACWAG RAACCGAAAGACCCACC CCGACCG AGGT 
XXVXRRCEY I AVGXGAMXXGPLGAACS> 



env 300-329 (156) 



3790 3800 3810 3820 3830 3B40 



CCATGCCCCCTCXrCTCCATSACACTGACAGTGCAAC 

GGTAC CCGCGACGGAGGTASTGTGACTCTC AC GTTCGaATACTGGGATCGTTTCTGGAG YAACGACTCT AAGTCTTTGTC 
TMGAASXTLTV Q A 1 Y D P S X0LXAE I Q K Q> 
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pol 466-495 (65) 3670 3880 3890 3900 3910 3920 

QCTCACGRTCAGTGGACATtrrCAGATTTW^ 

CCAGTCCT A<rrCACCTCTAWACTCTAAAW^TTCTCCCAAAGT lTl'llic CTTCGCAGGACCAGCC GGGATGTGGGC AGTT 
GQXOWTXQI XQBP FKN'GTVLVGPT P V M> 

3530 pol 121-150 (42) 3960 3970 3980 3990 4000 

CATCATCCCAAGGAACMTGCTGACACAGJnTGGC^ 
GTAGTACCCTTCCTTGKACGACTCTGTCKA^ 

I ICRNXLTQ X G X T L W P PI S K G S P A I F> 

4010 4020 po | 301-330 (54) 40S ? A0 ™ 4070 

AGTCCAGCATGMCAMAGATTCTGCACCCTTPTA GGAWAMAAAACCCTCASATCCTCATCTATCAGTATj 
TCACGTCCTACKGTKTCTAACACCTCCWJUUa 
QSSMX XILBPFRXXWPXMVIYQYPSPL* 




4090 4100 . 4110 net 136-165 (188) 4140 4iso 



4160 



ACATTCGGATWI\n M riCAAACTCGTCCCCG7^ 
TCTAACCCTACCACAAAGTTTCACCACCCGCACC^ 
TFGWCPK I* V P V D PXEVBBXNXGBNNC L> 

4170 4180 4190 4200 pol 271-300 (52) 423 J 4240 

CCTOTTTAGCAAATACACAGCCTTTACCATTCCCT 

gca4aaatcctttatgtgtcggaaatggtaacg 

L PRKYTAFT IPSXBWBTPGIRYO.YII V> 

4250 42<o 4270 4280 4290 en V 315-344 (157) « 3 *0 

T CCCTC A GGGATCdGGAACCACAATOGta^^ 
ACGGACTCCCTACdCirrTClI^^^^ 

LPOCwlcSTMGAASXTtTVQ.ARXLLSGI> 

4330 4340 4350 4360 4370 pol 451-480 (64) 

• • I • * * • r * 

CTCCAGCAACAGARCAATCTGCTdC^ 
CAGCTC GriV rC T Y G CTAGACCAofocCTCCT 
vqqqxbll'xbnr BILXBPVMCVYYDPS> 

4410 4420 4 4 3 0 444 0 4 450 ypU 61-81 (136) 4480 



• • • | * * 

CAACCATCTCRTCQCTCAARTCCAAAACCAACGOASAG 
CTTCCTACACYAGCGACTTYAGGTTTTCG^ 

kdlxabxobog'xeelsxx 



CTGTCC BCC WTGGTGGATATGGGAAACT ACGACC7CG 
TGACAGGYGGWACCACCTATACCCITTGA TGCTGGAGC 
VDMGMYDL> 



^spacers 4510 4520 4530 vpr 61-90 (116) 456 ° 



G V D N » L 



A A 



GAGTGGACAATAACCIt GCCGCO ATTAGAAYCCTGCAACACCTCMTGTTCTTTCACrTTACG 
CTCACCIKTTTATTGGA<}cGGCGjj^ 

IRXLQQLXFXHPRIGCXHS> 



45*70 4540 4590 4600 4610 gag 406-435 (28) 4640 

* * * * * * 



AGGATTGGCATCMYCCGTCAGAGAAGGGSCAGAGCT 
TCCTAACCGTAGKRCGCACTCTCTTCCCSGTC*JC^ 
RIGIXRQRRX R'a PRKKGCWKCGXB G H Q> 



46S0 4660 4670 4680 4690 4700 4710 4720 



gatgaaggattgcactgagagacaggctaactttctgg 

CTACTTCCrAACGTGACTXTCTGTCCGATTGAAAGA 

M K D C 1 E R QAM PL GK'XARL X I XTYWCL> 



vif 61-90 (104) 475 ? 47 «? 4? ™ 47B ? 4 ™ 4B0 ? 

ATACCGGTGAGAGAGACTGGCASCTCCGCCAWGGCG^ 
TATGGCCACTCTCTCTCACCGTSGAGCCGGTWCCGCAGTCGTAACT^ 

HTG ER O W X LGXGVS IEWr'xR BRABDSO 
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vpu 46-75 (135) 4630 4840 48 *° 4060 4070 4980 

A ACCA AAGCGAAGGCC ACASAC^GAGCTCAC^RCAWTGC^ 
rrGCr r T C CCTTCCGCTGTSTCnVfCGAGT^ 
NESEGDXEELSXXVDMGMYDLSS P A P R> 

4690 env 510-539 (170) 492 ° 4 «o 4940 4950 4960 

GGGACCCGATACGCYGGRCRGAATCGAAC ACGAAGCCCGAGACC RACRCAiVW^CACAAGCCrcAGCCTCGTGARTCCtjA 
CCCTCGGCTATCCCRCCYCYCTTA<XriTCTCCTTCCGCCTCTM 

GPDRXXXIBEBCCBXXRXR S V R L V X G>' 



4970 4980 nef 151-18G (189* 5010 5020 5030 



5040 



CT#GAGGTCCAGCAARYCAATRACCCAGAGAATAACTt 
C><rTCCACCTCCTrrYRGTTAYTCCCTCTCTr 

X EVEEXNXGBN.NCI.LHPXXXHGHEDEX> 
5050 5060 5070 p Q | 961-990 (98) 5100 5110 5130 



AGAGAGGTOAATAGCGATATCAAAGTGGTOCCCAGAAGGAAAGCC 
TCTCTCCAdTTATCCCTATACTO 
rbv'nsdikvvp R RKAXIIRDYCKQMAO 



5130 5140 5150 5160 po| 18-45 (35) 5190 



5200 



CGHTG^CTtrTCT GC CCROTTTCY CTTCCGACCAAACARGGCCTAACTlt rCTRCAACCAGAAAOCTCGGAGACGGACGCG 
X D C V A X ' F XS EQTXAN SXX5RKLG D G G> 
5210 5220 5230 5240 5250 „„ QQA-ilon tOTi 5380 



gag 390-420 (27) 



CACCCCASACACACCCAACAACCTCCACtfaCTTTC 
CTCGCCTSltriXmXCTICTK: CA GGTcdA^ 

C A XRQGTSSS'C P NCC KBGHXAXMC R A P> 



5290 S300 5310 5320 5330 5340 5350 5360 

• » *.» •* * • 

CGC AAGAAAU^rrUIl !»4jAAATGCGGAARGCAAGQCCAyAAATGAAAGACTGTACCG 
GC CTTCTTT CCAACAACCTTTACG CC TT Y C C TT C CQ GTA CTTTACT^^ 
R K KGCWKCC XB C h'qMKDCTE R Q A M F V G> 

gag 421-450 (29) 5390 5400 5410 5420 5430 5440 

5ACCCGGAAACTTTC IfCC AAAGqAAHTOCCTCTGCTATATCAAAATCTTTATCA 
CaTTCOTTKACCCACACCATATAGTrrrAGAAATAGT 
FX0S , XWLWY1KIPI> 




* 4 *° env 465-494 (167) 54eo 



5490 5500 5510 5520 



TGATCGTCCCTCCACTGRTTGCCCTCAGCATT R TCTTTGCCOTC 
ACTAGCAGCCACCTCACYAACCGGAGTCCTAAYAGAAACGGCAGGACAGCTACYAAT 
MIVGGLXGLRZX FAVLSIXN'GAXSXDL> 

5530 5540 n *>f /1A11 5570 5580 



nef 31-60(181) 55 ™ 558 ; spacers 



GATAAACATGGCGCTKTTACAACCTCCAATACCSCTGCC GCTGCC ATGAC 



CTATTTGTACCGCGARAATGTTCCAGGTT ATGGSGACGGTTATTG SGACTGACACRCACCGACYTCCGJ 
DKHGAXTSSNTX AMMXDCXW L X A 



CGACGC TACTG 



A A 



M T> 



5610 5620 5630 vp U t . 3 0 ( 13 2) S "? 5 "? M '! 

ACCCCTtJGAGATCAlXXXrTATCGTCGCCVTTATCGTCGCCCTCATCMT 
TGGGGACCTCTAGTAGCGATAGCAGCGGRAATAGCAGCCGGACTAGKATCCCTAACACCAGA 

PLEIIAIVA XIVALIXAIVVWTIXXI> 



5690 5700 5710 5720 DO. 136-165 (43) 5750 5760 ^ 

♦ ♦ * * r x ' • * A7 

A<nx'$g£§&^ join 

TCAT^^^^ITAKACGACTGGGTTK AGCCTRCGTGTC ACTT AAACGGATACAGCCGCT AACTSTGTCACGGACACTTT ' 1 

EYVEMXLTQXCXTLHPPISPIXTVPVK> 01 
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5770 s pacers seoo ssio env 255-284 (153) 584 ? 



CTGAAACCCCGAATCCATCCC GCCCCC \ YCTTTAGCCCTCGCGGA GGCRATAT SA R ACA C AATTGGAGAAGCG AA CTGTA • 
GACTTTCOCCCTPACCTACCC CGGCGC fRGAAATCCGGACCGCCTCCGYTATASTYTCTGTTAACCTCTJt.lX TTGACAT 
V K P C M 0 C A A XFRPGCGXXXDNWRSELY> 



5650 5S60 5870 5880 5890 5900 5910 5920 

• ♦ * . • * • * * 

TAACTATAACX?KrcrGRAGATTRAOT 

A G TTGTCTCGCGCTCACC 



ATTCATATTCCAGCACYTCTAA YTCGGAGACCCTY AOTGTACCTAAGGCCTTACCCTCAAGCAGTTGTGTGGGGCT 
KYKVVXIXPLG x't WI PEWEFVMT PP 



pol 556-585 (71) SS5 ! 5,s » 5S7 ? " 8 ? 5 "? 60 °? 

TCAACCTATGGTATCAGCTGGAGAAAGASCC™^ 

AG^TTCGATACC AT AGTCGACCTCTTTCTSGCATAG^ CTACCACTTATRACAT 
VKLWYQLBXXPI XCXB , PQDL.HXMI*irXV> 



«oio gag 181-210 (13) 604 f 



6050 6060 6070 6080 



GGACGCCATCAGCCCCCraTrayUU^TCCTGX 

CCTCCGGTACTCC GGCGATACGTTTAC GACTTTCTSTGTT AGTT ACTCCTTCGGC GrtCAGGACAAAGACCT AC CGT AAYT 
CGHQAAMQMLKXTIH B B A A*V L F L D G I X> 

6090 6100 pol 706-735 (81) «uo 6i40 6150 6160 

♦ • • , • 

CAAAGOTCAAGAGCAACATGAGARGTATCACTCCAACT^ 
GTTTCGAUriVIX X TTGrACTCTYCATACTCAGC^ 

KAQBBH BXYHS WWRTMAXXFM I» ' X K H X> 

6170 6ieo 6190 gag 31-60 (3) 622 ° 623 ° 624 J 

TCTGGGCCTCTAGGGAGCTGCAGAGATTCGCTC^ 
AGACCCCGAGATCCCTCCACCTCTCTAAGCGA^ 

vwasrbx.brfalnpxi.lbtxeccxqiIa> 

6250 6260 6270 6280 en V 215-244 (151) 631 ° 632 JJ 

GACCAACACATTA1CATTAGGTCC6AGAATYTCACABACAATC 
CVCX-TTXTCTAATACTAATCCACGCTC 
BBEI IIRSEMXT XMXICTXIVXI*liXSVX> 

6330 6340 6350 6360 6370 q aq -|-30 (1) "0° 

* * » • * 3 ' * ' * 

GAITAAcJaTGCGCGCTAGGGCTAGTCTCCTCAGMC^ 

CTAATTOTACC CXXX3ATCCCGATCACAGGACTCXC CGCCCYTC GACCTCCGGACCCT lTllH' AATCCGA GTCCGGACCGC 

in'mgarasvlxgcxl dawexirlrpg> 

6410 6420 6430 6440 6450 net 91 -1 20 (1 85) 6480 

CAAACAAAAACTATAcirrCAAGGAGAAGGGAGGCCTTX^SGGACTG RTrTACTCCMAAAAGAGGCAAGASATTCTGGAT 
CTTTCTTTITC ATATCOGACSCTCCTCTICCCTCOCGACCTSC^^ 

GXKXY R L KBXGG L X G L XYS XKRQ X I L D> i 

6490 6500 6510 6520 6530 6540 6550 6560 I 

CTGTGCGTCTATMACACACACGGATTt|^^ . . 

nArArreACATAXTGTGltraCCT A * ^^ i°' " 

LWVYXTQGFTRWGTXILGXVXICSASX>B2 

env 16-45 (138) 6 "? 6 "? 6S3 ! 6 "! J 

SAATCTGTCGGTGACAGTGTATTACGGAGTGCC^ 
STTAGACACCCACTGTCACATAATGCCTCACGGACAC 

NI»WVTVYYGVPVW R ' R XLLSGIVQQQX> 

"so env 330-359 (158) 6680 6690 6700 6710 6720 
* * * » *i* * * 

ACCTCCTGAGGGXTTATCCAAGCCCAACAGCATCTGC 

TGG AGGACTCCCGATAGCTTCGGGTTGTCGTAGACG AGGTCGAGTGGC AGACQC AGTCCGT AAAGGGGTCCGG AACCGAG 
WLLRAIB AOOHLLOLTVW'VRHPPRPWL> 
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vpr 31-60 (114) 



(750 6760 6770 6780 6790 6800 



CACRRCCTCCCACAGYACATCTATCAGACATACGCAGACACATCCXM 
GTCYVGGACtrCTGTCHTCTAGATACTCTGTATGCCTCTC 
H X LG OX IYBTYGDTWXGVBAL'XA L I X P> 

6810 Vlf 151-180 (110) 6840 6850 6860 6870 6880. 

CAAAAACATTARGCCTCCtXrrCra^TCCGTGAAAA 

KKIXPPLPSVKKLTEDXWNXPQK X ' Y S> 

6 89 o «oo po , 901 .930 (94) 6,4 °. 6,s ? 696 ? 

CTCGCGAAAGGATTRTCGATATCATTGCAWCCGACA 

GACCGC i ri C CT A AYAGCTATAin , AA€GTWGGClCTAAGTCTG ATl t:CTTG^ 

AG BR XX DI IA X D I QTK E L Q X Q I X K . I Q N> 

6970 6980 6990 pol 886-915 (93) 703 J 703 J 7040 

TT(|c C T C T GT r rA TCCATAACTTTAACAC^ 
AAaCGACACAAATAGGTATTGAAATTCTCCT^ 
F ' AVF IHNFKRX6G ZGGY SAGERI X DX I> 

7050 7060 7070 7080 g a g 256-285 (18) 7110 7120 

cgccasccatatc|rttcccgtgcgccawatct^ 
cccctsgctataAaagggcacccoctwacatattctctacctact 

AXDI , XPVCXIYKRWXILCLMKIVRMY> 

7i3o 7140 7150. 7160 7170 env 495-524 (169) 720 J 

MACCCGTCAOCATTCTCCATAT<kGAGTG*GACAC^ 
KTGGGCAGTCGTAAGACCTATAhTCTCACTCTGTCC^ 

XPVSII.DI*RVRQGYSPLSPQTLXPAPR>' 
7210 7220 7230 7240 7250 7260 7270 7280 



CGCCCTGACAGACYCCRASCCATTGAGGAAGAOTCCAG 

CCCCCAC TGTt TGRGL" rTSCGTAAL TC C lTCTUAGGTCSGTCC TCGTAGTCATA(j*A>"XAARGGCTTGTCt^ AGACRGAGT 

gpdrxxxieeb'sxodhqypixbqplxq* 



tat 61-90(122) 



7310 7320 7330 7340 7350 7360 



t 



GMCAACGGGAGRCAATCCCACAGHCCCTRAGGAAAGCAAAAAd 


^B^^3gcactsctocactccatcaataaccaact i ca 


CKCTTCCCL'n t GTIAGGGTGTCYGGCA t TCCTTTCCOTTTtq 


IffSSsPI CCTCACCAGCTCACCTACTTATTCCTTGACT 



B2 
Join 

XRGXHPTXPXBSXKASCVVBS MMXEL> 33 

7370 pol 856-885 (91) 740 J 7410 , 742 °. 743 °, 744 J j 

AAAAGATTATCCCACACGTCAGGGAHCAG«rrGAGCACCTGAAAACC^ 

TTTTCTAATXCCCTCTCCXCrrtXtriTGTCCGACT^ GACACGTTTAchGACGCTACGTCTACGACTTC 
KKIXGQVR-XQAIHLKTAVQH l AAMQHtK> 



7450 - 7460 gag 196-225 (14) 7 «'° 



7500 7510 7520 



GAWACCATTAACGAAC AGGCTQCCGAGTGGGACAGARTCCATCCCGTCC^ RTTSCCCCTCTC ACCGMGAT 

CIWrCCTAA'JTGCTTC TCCGACCGCTCACCCTCTCTY AGGTAGGGCAGX^ACGGCCTGGGY AASGGGCApAGTGGCKCTA 
XT INEEAAEW DRXHPVH AGP XX P ■ L T X I> 

7530 7540 7550 DOl 1 81-210 (46) *580 7590 1600 

• • r * 7 * » * 

TTCTAMAGAAATGGAAVAAGAAGGCAAAATCTCCAR 
AACATKTCTTTACl"rrBTfClTlt.'(»^ 

CXEMEXEGKISXIGPENPYNTPXPA15» 



7610 7620 7630 7640 po! 871-900 (92) 7670 



7680 



AAGTGAGAGASCAAGCCGAACACCTCAAGACAGCXG^rCCAGATCGCAGTCTTCATTC^ 

TTCAC~it.TiTSLriTCCccTrcTTO 

QVRX0ABBLKTAVOMAVFIHNPXRXGG> 
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7690 



7700 



7710 



pol 211-240(48) 



7740 



7750 



7760 



ATCGCAGGClAAAAAGAAAGATAGCACA*A^ 
TAGCCTCc JrmTllTlVJ ^TC GTC 
I GG'KK KDST KWRKLVDFRELNKRTQ DP3 



7770 



7780 



77 90 



7800 



env 540-569 (172) 783 J 784 ? 

CTGGGAGGTCCACCTCCXOTrTTYGGCTC^ 
GACCCTCCAGGTCGAGCCOAAAAlKXrCAGACCGAACCCrAC^ 

wbvqlg'fxa L A M O D LRS LCLPSYHR L> 



7850 



7860 



TR70 



7880 



7890 



7920 



vpr 76-96 (117) 

GAGACYTTATCCTCATCCyTCCCACAAYc|lt3CC^ 

ICUCGGYTG 

; 1 c x 



CTCTCRAATAGCACTAGCRACGGTCTTR C^WTTICTATCGTCTTAGCCGTAGTGATCCGTTGCATCTC SATCCTTGCCG 
RDX I LI X A R X ' C X HSR I G I TRQRRXRN G> 

spacers 



7950 7960 7970 env 1 55-1 84 (1 47) 8000 



KCCTCCAGGTCC CCTGCC CTCAAARTCIrfCCTTCCJMCCCATT^ 

HCGACGTCCAGt CGACGC CXX^rTYAGWQGAAGCTKGGGTAACGGTAAGTGATAACGC^ 

xsrs]aa|pkxxfxpipihycapacxail> 

8010 



8020 



8030 



8040 



80SO 



80B0 



vif 76-105(105) 

CAACTGTAACRATAAGAmTTCAATGcdGAA^ 
GTTCACATTCyTATTCTIQtAACTTACCa 

KCNXK XPNC'EXDWXLCXCVS I BWRX R> 



8090 



8100 



8110 



8120 



fi"o gag 481-499 (33) 81fi0 . 



GSTATAGCACAC 

CSATATCGT&TGTCCACCTGGGACTGGACCCCCTi 
XYSTQVDPX 



iTC 



ICTCTATCCTCCCTYAGCTTCCCTGAAAAGCCTCTTC 

ACATAGGACGCARTCGAAGGGACTTTTCGGAGAAG 

LADQPSLYPPXASLKSI,F> 



8170 



spacers 



8200 



8210 vjf 12 1-150 (108) 824 ? 



GGAAACGATCCCTXATCCCAJ GCCGCT AGAAGGGCTATCCTCGGCCAWAXAGTCACSAGAAGGTC^ 
CCTTTGCTACGGARTAGGGT1 CGCCGJ rCTTCCCCATAGGAGCCGGTWTMTC AGIXT Satri-rCCACACTCATACKCMCCCC 
CRDPXSQAA RRAILGXXVXRRCEYXXO 

8250 8260 



8270 



8280 



8290 



8300 



8310 



8320 



ACACAATAACGTCGGCTCCCTCCAATACCTCGCACTG 

TCTGTTATTCC ACCCGAGGGACGTTATGCAGCGTGAQ II I M S fGGGT K rJGGCCAACGWKGTTCACAATGACATlCTTTA 
HMKVGS L0Y LA&'SQPXTAC X X C Y C K K> 



8400 



tat 16-45 (119) 835 ? 836 J 837 °. pol 976-995 (99) 

(nTGCTWCCACTGTCAGSTCTGCTTCCr^^ 
CAACCAWCCTCACAGTCSAGACCAACCA C IT I CTI^^^ 

ccxhcqxcplxkglci'rdy G K Q M A G X D> 



84)0 



spacers 



8440 



8450 po! 721-750 (82) 



8480 



TGTGTGGCCRGCAGGCAAGACGAAGAC CCAGCC AAGrACCATAGCAATTGCAGAACCATGGCCARTCASTTT AACCTCCC 
ACACACCGGYCGTtXG'riX.'rtX'rfCTC CCTCGX TTCATGGTATCGTTAACCTCTTGGTACCGGTYACTSAAATTGGAGGG 
CVAXRQDB P i A A | KY HSHWRTMA XX PBL P> 

8490 8500 8510 8520 8530 8540 B550 



8560 



CCCTATCGTCSCTAAGGAAATCGTCGCAWRTTGCGATAAGTC 

CGCATAGCAGSCATIXX TT T AGC AGCGTVTf AACGCT ATTCAOpTGCTT ACCY GTCACCTTGACCACCTCCTTGACTTTK 
PIVXK E IV A XCDKC'MBWXL ELLE ELK>' 



vpr 16-45 (113) 



8590 



8600 



8610 



8620 



6630 



8640 



AWGAAGCCGTGAG 



ATGGCCTCGGTC AAC AuGAT RTC ATTAGCCTCTGGG ATCAGTC C 



TWCTTCGGCACTCTCTGAAAGGGTCTGGGAC^ 

X fi A V R H FPRPWLHGLGOHlDXISLWDQS> 
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86so env 106-144 (144) 8680 8690 870 ° 87io e?2o 
* * / ♦ » # # # 

CTtyiAACCCTCTGTCAAACTGACA^ 
CACTTTGG(»CACACTTKSACTCTCGCG^ 
LXPCVKi,TPLCVTLNCTH-AllL T XXXYST> 

8730 8740 vlf 91 -1 20 (1 06) 8778 8780 8790 88 <>0 

C CAACTCCAC CCCCRrCTCCCIt^CCAWCTCATlCACC^ 
arrrCACCTWXCCYW^CCCACTGGTWC^CT 

QVDfXLADXLIHLHYFDCPXDSX lT H p> 

8810 8820 8830 net 1 66-1 95 (1 90) 8870 8880 

TSRGCCMACACGGAATGGAOGATGAGGAWAGGGAAGTtX^ 

XXXHGMEDEXRBVLXWKPDSXLAXRHX> 
8890 8900 8910 8920 pol 151-180 (44) 8950 8960 

ASSPXXTVPVXLkPGMDCPKVICQHPLT> 

8970 8980 8990 9000 9010 gag 436-465 (30) 9040 

• * * * * * ^ 

CGAACACAAAA T CAAAGCt|A TTTCGCCTAGCttR^^ YGC AGTCCARGCCTGACCCT ACCG 

t*.i ivivi o t AGI I rCodrAAACCOGATCGKYUI ILCC'ri^OGACCGTTAAAOGRCGTCAGGTYCtJGACTCGGATGGC 
BBKI KA'lWPSXKORPGl*FXQSXPEPT> 

90S0 9060 9070 9080 9090 Vlf 31-60 (102) "20 

CACCCCCAGCCGAGAR CTTITtGATTCGGOATTAGC AAAAAGGCTAA SGGATCGTTTTAC AGACACC AT7WCCA WAGCC31A 
CTCGGGGiraxrciTC^AAYCTAMgC^ 

APPAEXPXP G'l SKKAXCWFYRHHXXSX> 

9130 9140 9150 9160 9170 9180 9190 9200 

* * * * • » » * 

CACtXTAAGCTCAGCTCCCAGCTCCACATTCCCCTCG 
CTCCCATTCCACTCCAGCCTCCACCTCTAAGCCCACCCdTACT 
HPKVSSBVHZPX. C 1 M HTACQCVCCPXHX> 

gag 346-375 (24) 9230 9240 9250 9260 9270 9280 



^GCCAGGCTACTGGCAGAOGCTATCTCCC 
IXXOTCCCATCACCCTCTCCCATACAGGOTCCR^ 

ARVLAEAMSQXXXAK I 1 P PIVXKEIVA> 



9290 po| 736-765 (83) ^20 9330 9340 9350 9360 
* *.*•♦* 

RCTCT CACAAA TCCCACCTCAAGGCTCACGCTATKCAC^ 
YCACACTGTTTACGGTCGAXnTCCCACTCCCATAMCTCCCTGTCCAC^ 

XCDKCQLXCEAXHG0VXCSP , S8CXRQX> 

9370 9380 rev 31-60 (126) *410 9420 9430 9440 

AGGA RCAA CAGACGTAGAAGGTGGCGTCMGAGGCAAACCC AAATCC RCXCCATCTCCGACWCGATTCTGCCACACA TRAG 
TCCTYCTrGTCTGC^TCTTCC^CCGCACXCTCCCrPTX : ^ 
R X M R R R RWRXRQRQZXXIS EXIL'GQX R> 



9450 9460 9470 gag 226-255 (1 6) 9508 9518 9528 

GGAACCCACAGGCTCCGACATTCC CGGTACCACAAi^ACACTCCAAGAGCAAATCCS ATCG ATGAC AARCAATCCCCCf^R 
CCTTCGCTCTCCCAGGCTGTAACGGCCATGGI^n 

EPRCSDIACTTSTLQEQIXWMTXNPpJ 

9530 9540 955.0 9560 pol 841-870 (90) 9598 960 °. 

RCATTMAGCAAGAGTTTCGCA1TCCCTATAACCCTCAGTCCCAGCCCCTC 
YCTAAKTCCTTCTCAAACCCTAACCCATATTGGCAGTXACCCTCCC^ 

XIXQE PCI PYN PQSQG VVESMNKBLK K> 
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9610 



9620 



9630 nef 106-135(186) *67o 9*80 

♦ • • • 

ATCATTGcdAGACAGGAGATCCTCGATCTCTC^^ 

T ACT AACUinCTGTCC TC TAGGAGCTAGAGACCC AGATGKTATGGGTTCCCA WAAAGGGACT6A (XGTSTTAATCTGTGG 

I I c'rqei ldlwvyxtqcxppdwxnytp> 



9690 



9700 



9710 



9720 



rev 46-75(127) 



9750 



9760 



CGGACCCGGAft¥CAGy*TAfjgs|^OT 
G<XTGGGCCTYR0TC7ATq§ggg^ 

GPGXRYPSRXRQRQIXXISEXILSXX> 



9770 9780 SV90 

* ♦ * 
TCCCCACAYCCGCTGA(XXrrinWCTCTGCAAi 

LCRXABPV 

98S0 9860 9870 

• • * 
TGCATGACCCA3ACACTGCTCGTGC A AAACCCT 

9950 



9800 



9810 gag 301-330 (21) 9840 



"AAGACACTGAGAGCXCAACAGGCTWCCCAAGASGTCAAGAAT 
tTrCTOTGACT C T CCCCTTG T CC GAWGCCTTCTSCA GrTCTTA 
PI.QL'XX.TLRABQAXQXVKN> 



9880 



9890 



9900 



9910 



9920 



:aaccctcactgJ3A 

kTTCCGACTGAC«Crri 
13 d n r* ' d 



lGARAGTGTATCTGKCTTGGCTCCCC cctcataa 
CTCTYTCACATAGACMGAACCCAGGGCCGAGTATT 
VYLXWVPAHK> 



9960 



9970 



9980 



9990 



X ' T D 

10060 



RTCS 

ATTGGGAGTCCTTYAGS 
P N P Q B X> 



10070 



pol 676-705 (79) 

ACCCATTGGCGGAAACCSAACAGCTtMACAAACrGGW 
TCCGTAACCGCCTTTOCTTCTI 

GI GGNBQVDKLVXXOZR 

iooio env 76-105 (142) 10040 10050 

WTCItSCAAAACGTCACCt3AGAACTTTAACATGTQ GAAAAA CRATA' 
WACA Ct: < nrr T CC ACTG LX ri l. T lt^ AATTCTA^ 

XLEWVTENFNHWKMXMVXQMX 

10090 ioioo env 170-199(148) 10130 10140 
• * * ' * * 

CTGAAATCCAATRACAAAAJtSITCAACCGAACTGGA^ 
GACTTTACGTTAYTCTTTT11SAAGTTGCCTTCXCCT 
LXCMXKXPIICTCPCXMVSXVOCTH C'X 



10000 



10080 



B 1 A G 
10150 



MACGGTAA 
X A X> 



10160 



E> 

10170 10180 10190 en v 600-629 (176) 10220 10230 10240 

GCTCAAGAWTA4XtXTKTlTC C CTGCT CA ^ 

LKXSAXSLLNATAIAVAXXTDRXXBV> 
10250 10260 10270 10280 vlf 46-75 (1 03) i0310 10320 

ytca<{tcccrgcatcccaaactctcca^ 

RAGrqAGGGYCGTAGGCrrrTCACAGCTCCCTTCACCT 

XQ'SXHPKVSSEVHIPLGXARLXIXTYW> 



10330 spacers 



10360 10370 nef 1-30 (179) 



10400 



gccctccasacaoa cctgc1 atgggccgtaaatcgtco 
ccggaggtstctca ccaccj taccccccatttaccacgttctccac^rgcagc^ 

glxtcIaaiiggxwsxxsxvgwpxvrbri> 

10410 



10420 



10430 



10440 



10450 



pol 496-525 (67) wwo 

CAca^CRCAscaxrrcccGCTcauax^ 

CATGMGATCCTWCTCCYCACCGCTATGGTTACTGC 
Y XRXRXAHTND> 



GTCTGYCYGTSCGGGACGGCGACTCCCTCA<|GAGTTC 

RXXXPAAECv'^XTGX 



t 



10490 10500 10510 1052O 10530 10540 10550 10560 B6 

[TGGGAGGSTCTGAAATACTXCXGGAATCTGCTC " 

Iaccctccsagactttatgamcmccttagacgag B7 

X N L l*> 



TC ARGC AACTGACAGMGG YTGTCC AAAAC ATTCOC AC AGAfjj||9TO 

agtycgttgactgtckccracacgtttw 

V'XQLTX X V 0 X I AT ESS 



B X L X Y X 
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env 585-614 (175) i°S9° 10600 10610 10S20 ■ io«o 10640 

CWC?TACTGGGCCCWGGAACTGAAAAWCTCCGCCRTCAGCCTCC^ 
G^ATGACCCCGGWCCTKyvCTrrTWGAGGC^ 
XYWGXBLKXSAXS.LbHATA l rXL PEKXS> 



loss? p 0 | 391-420 (60) 



10680 10690 10700 10710 10720 



CTCGACXXTTCAACGATATCCAAAACCTCCTCCGAAACCTC 
GACCTGGCAGTTCCTATAGGTTTTOGAGCACOCTXTCGAGTTGAC^ 

wtvndioklvgklhwasqiyxg'rax B> 
10730 10740 l?n v 345-374 (15?) 10770 10790 10790 10000 

CTCACCAACACVmxrrCCAACTGACAGTGT^ 
CAGTCGTTtnCWACGACGTTGACTGTCACACCCCGTAATTCGT^ 

AQQHX LQLTVWGXKQI»QARVLAX E R Y 1 I»> 

10810 10B20 10830 DOl 631-660 (76) 10860 10870 10880 

* * ♦ r 1 • * * 

GCCCTCCAGGATAGCGGATYCGAAGTGAATATCC^ 

CGGGAGGTCCTATCGCCTARCCTTCACTTATAIXAGTGGCTATCGCTTATGC 
ALQ0SGXBVMIVTDS0YAI.CI I X A Q P D> 

10890 10900 10910 10920 en v 420-449 (164) 10950 10960 
ft • • * ♦ • 

CARAAG<{GAAAGCGAAAltrrCCAACTATACCAJmn*G^ 
GTYTTCqCTITCCCTTTAGAGGTTGA 

X S ■ B REISWYTXXI*XILTBSQHQQDR> 

10970 10980 10990 nooo 11010 en v 285-314 (155) 11040 
• * * * • 

ATGAGHAACASCTCCTOGCTCCCACAARGGCTAAGAGAAG 

TACTCKTI^^SGAGGAqCGAGGGTGTTY CCGATTC"rC*r fC CCAGCAC i I T 1 1- LA. 1 1 1 1 CGCACGGCAGCCGKAACCGCGA 
M B X X L I» APTXAKRRVVX REKRAVGXG A> 

11050 11060 11070 xioeo 11090 pd 91-120 (40) 11120 

ATGWTTYTCGGATTCCItrCCCGCTGCqAAACCCAA 
T ACWAAR*GCCTAACCA(XC(XreACCtm 
MXXCFLOAA'K PKMIGGI GCP I KVRQY D> 

11130 11140 11150 11160 11170 11180 11190 11200 

CC AAATCHTTATCGAAATCTGTGGAMAS AAGGCTATOT CCCTACGA 
GGTrrAGXAATAGCTTTAGACACCTXTSTrCCGATA^ 

QIXIBICGXKA I»S YHRI»RDPI LIXA R> 

env 555-584 (173) umo 11250 11260 11270 11200 

YTClGGAACTl^TCGGCCRTACCTCCCTGAJ^^ 
RACACCTTCACCACCCCCYATCCACCCACT^ 

XVBLLCXSSLXGLXR g't LNAWVKVXEB> . 

11290 gag 151-180 (11) 11320 1133 ? 11340 11350 11360 J 

TTCTSTft ^CTYACGGCTTCACTAAGCCTACAAAWGGCGAGACAGGCTCCCTC J 01 * 1 
KXPXPBV1PMPXALSBGATLBSM TXAM>C1 

H370 H380 nef 46.75(182) 11410 11420 11430 1144 °. { 

CAATSCCGATTCCGYCTKJGCTGRAAGCCCAGGAAGAGG 
GTTASGGCTAACCCRCACCGACYTTCCGGTCCTTXrrCCT^ 

NXDCXWLXAQBBEXVGFPVRPQVP I RA> 

11*50 env 630-651 (178) 11480 11450 spacers 11520 



GGAGGGCTATCCTClUCATTCXCASGAGGATTAGGCAAGGCr GCCGCC GAATGCGATAGGRTT 

CCTCCCGATACGAGKTGTAAGGGTSCTCCTAATCCGTTCCGR^ CGGCGC CTTACCCTATCCYAA 

XRAILXI PXRIRQCXERALLAA BWDRX> 



A A 
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11530 »m. gag 211-240 (15) » 58 ? 1159 ? 

H pVHAGPXXPGO 



■AG 
»GTC 
R> 



„ef76-105(184) »«» "« 7 ° " 68 ° 




11770 U 780 U790 "Boo iisio po | 481-510 (66) 1184 ! 

* * »«. »••: "'" 



GTAT 
CATA 



% „ Btn H980 12000 

gag 31*345 (ZZ) . :,™ a «^acccctcagcgaagcc 




CCG 
G> 



1tt.1Q>Sff» "040 12050 12060 12070 12080 

12010 gag 166-195 (iZ) x * A 




12090 ^-a - • - w ■ 

m70 mso 12190 pol 241-270 (50) »"° 12230 ? 

pol 541-570 (70) 



moo gag 241-270 (17) "no »»» »*T ^ 




12330 "340 



12350 12>60 12310 nef 121-150 (187) 




cc 

GGAG 

.. i d d n if a n 2 * - - 

12480 



12440 12450 pol 571-600 (72) 
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12490 12500 12510 12520 gag 136-165 (1 0) 12550 12560 

* * I * * « 

GCCGCCARCAGAGAGACAAACCTCGGOCAAAACSY^ 

CGGCGGT ¥ &TCTCTCl*TirrCCAGCcdGTTTTC;S RCGTCCCTGTCTACCACGTACTCSGA KAATCGGGGTCCTGGGAGTT 
A A X R E TKLc'QNXQGQHVHQXXSP R T L N> 



12570 12580 12590 13600 12610 en V 61-90 (141) 12640 

CGCTTCGCTCAACCTCfmxaACACA^ 
GCGAACCCACTTCCAGYAGCTTCTCTTTCSCAAATYC^ 

AWVXVXBBXXFX , XTBVHWVWATHACV> 

12650 12660 12670 12680 12690 12700 12710 12720 



CTACCCATCCCAATCCCCAAGAGRTTSVKTCTGGAGA^ 
GATGCCTAQGGTTACGCGTTCTCYAASWGCACCTCTTACACTC 

PTDPN PQEXXX.EJIVTE'l,KDQXXLGXWG> 



env 375-404 M61) 12750 12760 12770 12780 12790 12800 

* * • * * '* 

TOCTCCGGCAAAKrCATTTGCACAACCRNTOTGCC^TGGAACACCW 
ACGACGCCGTTTKACTAAAC G JGr rCCYKACACGGAACCTTGTCGWCGACCACCTItCT 

csgkx icttxvpwhsxwsh'xxghnkvo 

12810 Vff 136-165 (109) 12840 12850 12860 12870 12880 

* * * ♦ , * • 

hACCCTCCM»rATCTtiGCTCT CM HSCCTCTX^TThHCCCTAKGAAAATCAHACCC3C CTCTCCCTAGQGYTAAGACAATCA 
TTCGCATOTCATAGACCGAGACTWCtXAGACTAATKCGGATTCTTT 

SLQYLALXALIXPKKXXPPLFS'x KTI> 



TTGTGCATCTGAATRAGTCCGTGGWAATCAATTGCACAAGGCCTAJ^ 
AACACCTAGACTTAYTCACGCACCVITAGTTAACG'I^n^ 

I VHLNX SVXINCTRPXNMTRX 



12890 12900 env 230-254 (152) 12930 spacers 12960 
* * 1 9 # ^ 1 1 * 

™ , A C2 

Join 



A S E X> 



12970 12980 12990 gag 106-135 (8) 13020 13030 13040 

C A G AAW AAGTCCMAACACAAAACCC AGC AAGCCGCCGCC GATACAGGC AACTCCAGCMAGGTCAGCCAAAACTATCC CAT 

c-rcrTwiTCACCK'jrivjtJ , rrrnxcTCCTTCCGCGCcc^ 

QXKSXORTOOAAAOrGXSSXVSQMyp I> 

13050 13060 13070 13080 DOl 826-855 (89) 13110 13120 

I * * * * * * 

TGTOTCCAACTITACCTCCRCCRCT^ 

V*S K PTSXXVKAACWWAXIXQBPG I P Y> 
13130 13140 13150 13160 13170 pol 586-615 (73) 13200 

atccccaaagccaaUcattctatgtggatgccgctg^ 

TAGGGGTrAt.XJUTJf Ft>TAA GATAC ACCT ACCCCGACGGTY AltlCC'lM'l^n^rTCACCCri^rCCGACCGATAC ACTGTC TG 
n p q s q't PYVDCAAXRETKLCXACYVTD> 



C3 



13210 



13220 13230 13240 13250 pol 766-795 (85) 33260 

• * * * * * * * 

ACACCCACACACAAARTCRTTAcdcGAATCrCCCAC 

TCTCC i/ I X. 'lt « 'f LT rrYA<^AATcdcCTTAGACCGTCGACCTG 

R C R Q K X X sic XWQLDCTHLBCKXILVA V> 

13290 13300 13310 13320 13330 13340 13350 13360 

» • # * * * * * 

CCACCTCGCCTCCGGCTACATTCAGCCTGAGG 

ggtccaccggagcccgatgtaactcccactccaaccgttactc^ 
hvasgy i e a ev'gn eqvdklvxxgi rk> 

doI 691-720 (80) 13390 13400 13410 13420 13430 13440 

^ * ' • » I • * # • 

TCCTATTCCTCGACCCAATXmATAACGCTCACC^ 
ACCATAAGGACCTtXCTTACYTATrCCCAGTCCTTCTCCTCCT^ 

VLFLDG ZXXA Q E E H BVR ERIRXXXPAA> 
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net 16-45 (180) 13470 13*90 13500 13510 13520 

* * * * # ♦ * * 

GAAGGCGTCGCCCCTGY CTCCCTCGGATCTGGAT AACX ACGCAGCCMTC ACCTCGAC^ 
CTTCCGCAGCCGCGACRGAGCGYCCTAGACCTATTCMTGCCT^ 

egv-gaxsxdldkxgaxts'tsgt 



CTCCC 
Q Q S 0 C> 

13530 rev 91-120(130) 1**60 13570 13580 13590 13600 

AXCTOAAACTGGCCTCCCOPKXCTCAGATTTC 

TACCAGTAGACCC 



TBTCVCX PQIXGESSXXLGXg'sI 



V I w> 



13610 13620 pol 526-555 (69) 13650 13660 spacers 



GTAAAACCCCTAAGTTTARGCICCCCATTCAGARAGAGACATGGG^ SCTGCI 



CATTTTCGGGATTCAAATYCT^GOGGTAAGTCTrrert^ 
GXTPXFXfcPIQXE TWBXWWXXYWQA 



lCGA 
A A> 



13690 13700 13710 en V 140-169 (146) 13 ™0 13 ?50 13760 

TACAGACTGATCARCTCTAACACAAGCGnTTATCAMACAGGCTltX^ 
ATCrrCTGACTAGTyGACATTGTCTTCGCRATAfrrXTC^ 
YRLIXCRTSXIXQACPKXXPXPIPIHY> . 

13770 13780 13790 13800 DOl 376-405 (59) 1*830 13840 | 

* * * * * * /VQ 

CTGTGtJC lci&BBa^^ . . 

GACACCGCGj{||^SACCTACCCGATACTCGAGGT^ JOin 

CAPPSWMCYBLHPDRWTVQPZXLPBK> Q4 

13650 13860 13870 13880 13890 gag 331-360 (23) 13920 1 

* • I « • • • J 
ASTCCTCCACACTCAATGACATTCAQAAAMCAATTCTGARAGCCCTC^ 

TSJMXACCTGTCACTTACTGTAAGT Of T f t# G 1 1 AAGACTYTCGCCAGCCCXCTCCGCGAMGGGACCTCC TTTACTACTGT 
XSWTVMDI Q K X I I* X A 1* G X GAXLEBMM T> 

13930 13940 13950 139(0 13970 13980 13990 14000 

* » * . • • • * * 

GCATGTCAGGGAGTGGCAGGCCXTRGCCATAAGCC - ^ 
CCTACAfaTCCX^ C ACCCTCCCCCAYCGCTATTCC^^ 
ACQCVCCPXHKA , RVYYRDSRDPXWKGP> 



pol 931-960(96) 



14030 14040 14050 14060 14070 14080 



TGCCAAACTGCTCTGGAAAGGCGAAGGCGCTGTGGTCATCXAA 

ACGCTTTCACGACALt.TrJl.CU. 1 1 CCl X GACACCAGTAGGTTCTuY AATTCTAACCTCCGOTTG ACTWTCTTCGGGAGG 
AKLI.VKGEGAVVI0O l XKXGG0I*XBAX^ 



"030 pol 61-90 (38) 



14120 14130 14140 14150 14160 



TGGATACAGGAGCCGATGACACCGTCCTGGAAGAWATSAATCTGCCTGGC^ 
ACCTATGTCCTCGGCTACTGTCCCACCACCT^^ 

LDTGADDTVLBXXNLPCX w'g I K Q L Q A R> 

14170 14180 env 360-389 (160) 14210 14220 spacers^ 

• • * ' * * r 1 



GTCCKXXrrRTCCAGAGGTATCTGAAACATC GCTGCT ATGGAAAA 

CAGGACCGAYAGCTCTCCATACACTTTCTACTO CCACG/ rACCTTTT 

VLAXBRYLKDOXXLG'XWCCSG K I A A I M E N> 



14250 14260 14270 Vff 1-30 (100) 14300 14310 14320 

* * t » * * • • 

CAGATGQCAACTGHTCATCGTCTGGCAACTGCACAGGATGARGATTAG 
GTCTACCGTTCACKACTAGCAGACCGTTCACCTGTCCTA 

RV#QVXIVWOVDRMXIRTWXSLVXHHM> 



14330 14340 14350 14360 en V 390-419 (162) 14390 

* * ♦ • % * • 



144 00 



ATfc«TATCrGTACCACARHCGTCCCCTG^ 

T WAAT AGAC ATGGTGTYKGC AGCGGACX^TTG AO S CACCTC GTT ATTCAGG RAGCTTCTCTAAACCYTATTGTACTGG 
X»X ICTTXVPWNSXWSNKSXBBI WXNMT> 
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14410 



14420 



14430 VpU 16-45(133) 14460 14470 14480 



TGCATKSAATcdcTGATWrcCCTATCGTCGTGTGGACCATTC 
ACCTAMSTTACdG*CTAAKAGCG^^ 
W X X W ' L I X A I V V W T I X X I 



14490 



EYXKLLXQR X> 
14550 14S60 



14500 14510 14520 gag 46-75 (4) 

• I * • w 

AATCOTAGCCTCATC^ 

TTACCTATCCGACrrAGTTTTCaCACTTCGGACCCCA^ 

I DR LI X R L N P C L LETXBC CX Q I L X Q L> 



14570 



14580 



14640 



14590 14600 14610 14620 14630 

* * * • » • * * 

AG_rcrcCCTC»lACACA<XX^CCC_\AG^ 

2 ^ ? T X T U TlT L Tt' yCTAACTGTCTGACTAAYTC 

GXSBI*5SKKLLX0RXZ 



Q X A L X T 



VpU 31-60(134) 14670 146B0 14650 



14700 



D R L I X> 
14710 14720 



AGAAYCACAGACACAGCCGAAGACTCCCCCAA'. _______ __ ._ _ _ 

RXRERABDSCWHSB 



LCACCCGGAATCAGATACCAATACAATGTGCT 
htnXXXXCTTAGTtCTATGGTTATGTTACACG^ 
C D T PGIRYQYNV L> 



14730 pd 286-31 5 (53) 

CCCCCAAGGCIXKAAGCCCTCCCC 
P Q O W K G S 
14810 



14760 



14770 



14780 



XASCCATTTTCCAAAGCTCTCATG^ 
X-TI^GGTAAAAGGTTTI-tiAGXyrAC KG^MTl AGGAdTAI 

pxipqssmxxil'm 



14790 



14800 



ltg_atgcaaac*5ggaaacttta 
Jtactacgtttcccctttgaaat 

M O R G N P> 



RGGCACMGAAAAGGA' 
YCCCTGKC_TTtCCTAATAGTTCAC( 
XCXKRIXICCr 



14820 gag 376-405 (26) i-aso i486o 

-TMT C OCTARCAA' 
NCGKEGHXAXNC 



14870 



14660 



R P P L B> 



14690 



14900 



14910 



rev 76-105 (129) 1494 J 



14950 



14960 



AGAC-^ACCTCGATTCCTCCTSAGGATW-^ 
TCK»CKTCCACCTAACCAGCCriW-^^ 
R L X D C S B D X X T 5 C ? Q Q S - - 



QCTETGVQ I*> 



14970 



14980 



14990 



15000 



pol 781-810 (86) "030 15040 

CGrrOGC-XOTCATGTGGCCAGCGGATATATCG 
GCACCGACACCTA<-HCCGGTCGCXrrATATACCTTCGCC^ 

VAVHVASGY X B A E V X 



1S050 



PAETGQBTAYP> 
15090 



3L5060 15070 1S080 15090 en V 200-229 (150) 15120 

«X_TCAAd^TTARGCCT_TOnCACC^ 
AGCACTTOTAATYC^GGACACCACTCCTCTCreGAGC^ Y ACYAATAGTCTTCG 
X L X 1 ! XPVVSTQLLLMCSLABBBXXIR 



X'lXPVVST 
15130 15140 15150 



S> 



15160 



15170 



CAAAACYTTACCRA TAAcjhAA^T OC' 
C VITK-RAATCCYTA I TOI 1 1 GACXZ 
BNXTXIi'KL' 



pol 406-435 (61) "200 

T_xritrcccAAACTGAArPGG«rr_tx^ 

'AGCCGTTTGACTTAACCtXAACGGTTrACATGSCA^ 

VCKLNft#ASQlYXCIKVXQL> 

15220 15230 15240 15250 en V 1 21-139 (145) 

* . * * * * ' * 

GTICTAAOCTCC-t^AGAGCCRCCAAACX^^ . 
CACATTCGAGGACTCTCCGYCGTTTCGaCACTGGC^ 

C KLLRGX Ka'lt P LCVTLNCTN AN L I N> 



15210 



„ s P acer s 1531 ? l53J !| tat 76-102 (123) 1536 ° 

tcaatgctgci caa>ccac-*cu_cgataaccctacccrtccc 

ACTPXCCACGy grnttXaCTCCGCTATTGGCATGGCYACCX^^^ 



Ml A A 



QXRG DNPTX PXESXKXVX S K X E T> 
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spacers 



15390 



15400 




rev 61-90 (128) 



15430 



15440 



»COOICTKTCTCGGAA<raCTGCCCAAC 

iCGTKCAMAGACCCTTC CRGACGGCTTGGCCAGCCCGAGGTCGAGGGGGGAG ACCT 
PSSXXLGRXA£PVPLQLPPLE> 



15460 



15470 



15480 



15490 



15500 



15510 



15520 




AACCCTCMACCTCGACTGTAGCGAAGAO^GTC 
rrCCCJUSKTCGAGCTGACAT 

RLXLDCS BDXX'XL 



vTAAClX30GCCTCCCTt»rcGAACTGGriO<ATATCWCCA 
TATTCACCCGGAGCCACACtrTTGACCAAGYTATAGWGGT 
DKWASLWNWPX I X> 



env 450-479 (166) 1555 ? 



15560 



15570 



15580 



15590 



15600 



ASTGCCTGTGGTACATrAAGATTTrCATTATGArTGTC 
TSACCGACACCATGTAATTCTAAAACTAATACTA/. 7..CCC 
XWLWYZKIFIHIVGO' 



15610 




iTAAGATTGTCAGGATGTACYMACCTGTCTCCATC 
' ATTCT AACAGTCCT ACATGRXTGGAC AGAGGT AG 
WXIVRMYXPVSI> 



gag 271-300 (19) 15<s ? "™ 1568 .° 

CTCtiACATTARCCAAGCCCCTAAGGAACCCTTCAGGGATTA^ 
GAGCTCTAATYCCTTCCGGGATTtXTrTGCGAAGW 

LOI XQGPXBPFR D Y V D R F A R L I* W K GE G> 



15690 



15700 



pol 946-975 (97) * 573 J 



15740 



15750 



15760 



AGCCGTCOTCATTCACGACAACTCCGACAT^ 

Ttrax:AGCACTAA<yi*:cuvrre*ccCTGTAA 

A VV IQDKSOIXVVPRRXAKII , ELHKR> 



15770 



15780 



15790 



pol 226-255 (49) 



spacers 



CCCAACACTTTIXXXaU*CTCCAAC7*ra OCCGC1 
CCCTTCT G AAAACCCTICACGTrGACCCTTAGGCAG^^ 1 1 CAGGCAl fi- J H_A< CGGCO 

TQ DFWBVQLGI PHPAGLXKKK SVT V | A Aa 



15B50 



15860 



15870 



1S880 



env 1-30 (137) 



15910 



15920 



ATCAGAGTGAAAGAGACACAGATCAACTGGCCXTAATCT 

TACTCTCACn 111 IC TGTGTCT ACTTGACCGGGTTAGACAC CTYCACCCCXn^TKACTAAGACCCTKACCAGTAST AAAC 
MAVKBTQMMWPWl.irXlfGrXrtiGXVXZC> 



15930 

CTCCGC 



15940 



15950 



1S960 



15970 




pol 421-450 (62) 160 °; 



iTTAAGCTCARACAGCTCTGCAAACTGCTCACGCCTRCAAA^ 

r AATTCCAGinrTGTCGAGACCTTTGACGACTCCCCAY G XGTGACTGTC 
SAS'lXVXQLCKLLRCXEALTXlVXLT^ 



16010 16020 16030 

AGGAAGCCGAACTCGAACTQCTCAHA' 
TCCTTCGGCTTCACCTTCAOGACTWTi 

ebablel'lx 



16040 16050 net 181-196(191) lsoao 



16090 . 
GAGTWCTACAAAGA 



spacers 



lTATSGCCAGGGAACTGCRTCCC 
'AfXTTTC AAACTGAGGGYtXjAGCGGGMCTCTCT SCGGTC CCTTGACGY AGOG 
HXF.DSXLAXRKXARB1.XP> 



16120 



m30 . env 570-59S (174) x$l6 ° 

0 ^ L ^ , , w _ m _ . ....... „ , ,„ „ ,, ^^^.JRCTCCAGCCTCARCGGACTGCRAAGGGGATG 

CTCAWGATGTTTCTGACC CGACO ^CClXX»GGAtCC7^GAGCTCCGAGTYCCC7^ACG^ 



SXYXDCAA 
16170 16180 



VBLLGXSSLXGLXRGWEXL> 
16190 16200 16210 16220 16230 16240 



C5 
join 
C6 

i 



CAACTATTKGXGGAACCTCCTGCWGTATTGGGOq 
GTTCATAAMCHCCTTGGAGGACGHCATAACCCCtf 
XYXXNLLXYHGS 



3 CTGGRGCAACTGCAAYCTGCTCTGMAAACCGCAWC AG AGG W> 
gGACCYCGTTGACGTTRGACGAGACXTTTGGCCTVKmrrCC join 
LXQLQXALXTGXE> Q7 



gag 61-90 (5) 



16270 



16280 



16290 



16300 



16310 



16320 



AACTGARGTCCCTCTWTAACACARTCGCTACCCTCTGG 
T^TGACTYCAGGGACAHATnnGTYAGCGATtXXSAGACCACACACGTAGTO 

EL XSLXMTX ATLHCVHQ BLYKY KVVX I> 
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16330 env 270-299 (154) 



16360 



16370 



16380 



16390 



16400 



RAACCCCTCGGCRTTGCCCCTACCARAGCCAAAAGGAGAGTGGTCSAGAG 
YTTCGGCACCCGY AACCGGCATU/rYTCC GI I TTCC 
X P L G X A P T X AXRRVVXRE X R 1 




WATCGTCMCACT 
'AGCAGKGTGA 
L 7? X I V X L> 



16410 



16420 



pol 438-465 (63) 



16450 



16460 



16470 16480 

CACCGAAGAGGCTGAGCTGGAGCTCGWGGAAAA^ 
CyroCCT TCTCCOJyCTCGJtCCTCGAPCKCCTTTTO 

T E E A B I* ELXBNREI LXEPVHGV y'r V L> 



16490 16500 16510 gag 361*390 (25) 16S4 J 



16SS0 16560 
CCGAACCCATCACCCAACTCAMCKATGCCAACAT^ 

GGCTCCCTACTC 0XXXAi|IMMQRG)JpxcxJCR j x *£J 



16570 



16580 



16590 



16600 



nef 61-90 (183) 



16630 



16640 



CAACAGGAAGAGGRGGTCGGCTTCCCCGTCAGGCCrcAGGTCC^ 

G Tll IIXITL'JIX > CCACCCGAACGGCC AGTCCGGAGTCXIAQGGTCACTCTGG ATACTGGATCTTT^ 
QBEBXVGPPVRPQVPLRPMTYXXAXDL> 



16660 



16670 



16680 16690 ga g 286-315 (20) X * 12 ° 

^CAAACytGCCTTTCAGAGACTATGTGGATAGtrrrTTW 

ATACACCTATCCAAAAM C T IT P GCG AGTCCCCA C T C X FX VC 
F' X Q C PR BP PR DYVDRP X X T I* R A B Q> 




16740 



16750 



16760 



16770 



CCWCACAGGAWGTGAAAAACTTGGGAGAAAATCAGACTGAG 
COWUTG TC CTWCA CIT r rritl ACC L " !^ 

axqxvxm'wbxirlr 



gag 16-45 (2) 



16000 



kGACCTGCTGGCAAAAAGAAATACARAMrGAAACAOfTPGTG 
CTCTCCACCA LCU ' rPl - Plt. ' /T rATCTYTXACTTTCrGXAACAC 
PGGKKXYXXKHXV> 



16810 



16820 



16B50 



pol 646-675 (77) 



16880 



16830 16840 
XtXX3CCTCCAGGGAACTGGAAAGGTTTGC<ITCC 

ACCCCGAGGTCCCTTGACCTTTCCAAACGOAGGGTCATACGGGAGCCG^ 
P A 1 S Q Y 



WASRBLBR 

16890 16900 16910 



A L G 



XAQPDXSBS> 
16940 16950 16960 



16920 16930 

* » * • • 

CGACSTWrCARTCAGATTATCCAAVAGCTCATCAAGAA£^ 

ACCGGCYTHC L lt ^ TLI V I V rYAGTAACTCCACC 
XX TDRXIBV> 



CCTCSACCACTY ACTCTAATAGCTTB^CGAGTAGTICTTOTAACGGCAGCGG 
EXVXQIIBXLIK X I A V A 



env 615-644 (177) 17000 17010 



17020 



17030 



17040 



C7 



YCCAAAGCCCTItGGAGAGCCATTCTGMATATCCrc^ 

rggtttcccgai^ctctcggtaagacxtataggggtsctcttagtctg^ Join 

X Q R A X R A I I* X I PXRIRQTRLAGRWPV X> C8 



17050 



pol 811-840 (88) 



17080 



17090 



1710O 



17110 



17120 



R 7 AATCCAT ACCGATAACGGAA<XAATTTCACAAGCRCTRCCGTC AAGGCTTGCCTCCTCGTCCCC 
Y RTTAGCTATGGCTATTGCCTTCGTTAAAGTGTTCGY GA YGCCACTTCCGACQGACG ACC A CO 
XIHTDWGSNPTSXXVRAACWW 



CCDGATGTGARi 
ICGJtCTACACTY* 
A D V X 



TGTGARACAGCT 
TYTGTCCA 
Q L> 



17130 17140 pol 511-540 (68) 1717 J 



i7iBo 17190 spacers 



CACCCHAGYCCTCCAGAAARTCGCTACCGAAAGCATTGTO CCTC 
GTGGCKTCRGCAGGTCTrrYAGCGATGGCTL'1'CGl AACACTATACCCL'U*l'i^TGl^^X^^l^AACTYTCACGGATACjCCAg 
TXXVQXXATESXVIWGR TPKPXLPI 



^P acerg > Bglll EcoRI 



cdGCCAGCAACGAGAACATGGASRCCA*K GCTGC1 Ity»GATcJGAATT<jGCC 
CC CGGTCGTTGCTCTTGTACCTSYGGTM CGACG* ACTTTCTAGJ 
Aj A SMEMMXXM I A A I J 1 R S 

Flu NP epi (Mouse) Stop 



CTPAAClCGG 
E F A> 
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10 



20 



30 



40 



50 



60 



70 



60 



CCGCCTA«rrGGTAClCTCCGGGAAC^ 
C.G5TMTGPCXNVSXVQCTHGIXPVVST> 



90 



100 110 120 130 140 150 160 

##•*** ♦ 

Itccctcaraagcctctocaat^ccrto 



Q L L- L N G S L X S LXNTXATLWCVHQRI X> 

170 180 190 200 210 220 230 240 

* « • * * • . • * 

TCARGGACACAi^GCAAGCCCTC GACAAAATC 

VXDTREALDKI ELGDGCGAXRQGTSS S> 

250 260 270 280 290 300 310 320 



YTCARCTTTCCACAAATC^ 

xxfpQlTLWQ RpLVTEppRX) j NpxMVI> 

330 340 350 360 370 380 390 400 

TTACC^TACATGGATOATyTCTATnCT 

YQYMDDLYVGSDLEIGQHFTTPDKKH* 

410 420 430 440 450 460 470 480 

AAAAGGA^U^ , ? , ^ „.„ « m .- - — 

J __ _ rATC^ 

Q K B P P F L W M G Y ELHPDRWTVQPXX F P Q> 



490 



500 



510 



520 



530 



540 



550 



560 



JTLWQRPLVTX KIGGQLX EALLOTGS X> 



570 



580 



590 



600 



610 



620 



630 



640 



ACCGTCTTTCTTTGCATCCGTTGCATCTSCGCGAGGAGTCTCGTCKYTCCT 

GRKKRRQRRXA PQSXXDHQYPIXEQ P> 



650 660 



670 



680 



690 



700 



710 



720 



AGRGGAAGAAATCCCTTTTGGACaSAAAGGKCGTTCCAYTTCGGTCTCTC 

LXPFR E N L A FXQGXAREFXS E Q T X A N S> 



730 



740 



750 76C 770 780 790 800 

* • • * * 

tSAAAGCntlOGYCRTTCTGGGAYCTGGCACCAAAAACGCCGCTACTAG 



RGGYGGAGG1 

XXSRK SPQISGESSXXLGX6TKN A A T S> 



810 
t 

TGAATTCGCC 



B P A> 
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Input parent 
polypeptide sequence (s) 



Add alanine spacers to the ends of 
each polypeptide sequence for processing 



Optional 



Fragment polypeptide sequence (s) into 
fragments (e.g., 30 aa) which are 
preferably overlapping (e.g., by 15 aa) 



Reverse translate the fragments to provide 
a nucleic acid sequence for each fragment 



Scramble or randomly rearrange!^ 
the fragments 



Yes 



Have any fragments been placed 
together to recreate at least a portion 
^of the parent polypeptide sequence? 



No 



Link the rearranged fragments together 
to create a synthetic polypeptide ' sequence 
and/or a synthetic polynucleotide sequence 



Output the synthetic polypeptide sequence 
and/or the synthetic polynucleotide sequence 
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r Scramble "/ 95/21 6 

r Includes 7 

include <stdk>.h> 
#indude <stdlib.h> 
include <string.h> 
include <time.h> 

r Constant definitions */ 

/" Version Information 7 

#defineVERSIONNO -0.2" 
#define VERSION_DATE "04/03/1 999" 

#define KEYBOARDJ3UFFER_SIZE 256 /"size of keyboard read buffer 7 

LENJSODON 4 /length of codon (including 

#deflne BUFFER_SIZE 10000 /*size of file read buffer 7 

Refine TRUE 1 /"boolean true V 

ffdefine FALSE 0 /"boolean false */ 

1* Error codes 7 

#define E_NOERROR 0 /"no error*/ 

#define E_NOINFILE 1 /"genes file not found */ 

#define E_MALLOC 2 /*memory allocation error 7 

#define E_FILEREAD 3 /"file read error "/ 

^define E_CREATE__OUTPUT_FILE 4 /"error creating output file */ 

ffdefine E_OVERLAP 5 /"segment overtap >= length 

r Structure definitions */ 

typedef struct gene GENE; 

typedef GENE " P_GENE; 

typedef struct gene_segment GENE SEGMENT; 

typedef GENE_SEGMENT " PJ3ENE_SEGMENT; 

struct gene { 

char "name; 

char * data; 

P GENE nextgene; 

}: 

struct gene_segment { 

PJ3ENE pjgene; 
int number, 
int offset; 

int first_codon_cho»ce; 

char " amino__data; 

char * dna_data; 

P GENE SEGMENT next seg; 

>: 
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r Function prototypes 7 96/216 

intprolog(); 

intg t_parameters(); 

int read _int(char * prompt); 

int loadjgenesO; 

int add_gene(char * gene_name,char * gene_data); 

void insert _gene(P_GENE * head,P_GENE new jgene); 

int add_aa{); 

int split jgenes(); 

int splitjgene{P_GENE g); 

int insert_segment(P_GENE_SEGMENT * head_seg,P_GENE_SEGMENT newseg); 
int convert_segments__aa_to_dna{); 

int convert_aa_to_dna(char * aa_ptr f char * dna_ptr,int first_choice); 

char # codon(char acid_char,int preferred); 

int perform_scramble(); 

int scrambte_segments(); 

int adjacent_segments(); 

int displayjgenesO; 

int write_output_file(); 

void strip newl ine{char * strip_str); 

void padam ino_string(char * amino_ptr, char * padded_ptr); 

int even(int test_num); 

void read_str(char * prompt, char • string); 

char # read jK>nblank_line(char * buf.int buf_size.FILE * in _file); 

int user_cortfirmationO; 

voidtestO; 

I* Global variables V 

char * codonjabte[26][2] = { 
/•AOOVrGCCVGCT^, 

r-oivrrrwrn. 

/* C 02 7 fTGC" > TG"r'}, 
r D 03 V {"GACyGAT"}, 
/•ED4VrGAGVGAA"}. 
/* F 05 V {TTCfTTF}, 
r G 06 7 {"GGCYGGA"}, 

/ # H07 7fCAcycAT l 7, 

ri08 # /rATC","ATr}. 

/* K 10 V fAAGVAAA"}, 
TL11 VCCTGVCTC1, 
/* M 12 7 fATGYATG"}, 
/* N 13 7 {•AACVAAT}, 

rpisvrcccr.-ccr}, 

/•Q16VrCAGVCAA"}. 
/• R 17 V {"AGGVAGA"), 

rsi8 7rAGCYTC<r}, 
rTi9 7fACCYACA"}, 
r-2o # /p?r t "??r) l 

/*V21 VrGTGVGTC"}, 
rW22 7fTGGVTGG"} > 
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r-23vr???v??r}, 

r Y 24 7 fTAC-.TAT"}, 97/2 1 6 

r-isvrnrrm 9 ) 
>; 

char * errorjextfl = { 
roo V " 

r 01 7 .•ERROR: Input file not found!" 

r 02 V ."ERROR: Memory allocation error* 

r 03 V ."ERROR: File read error" 

r 04 V ."ERROR: Could not create output file" 

r 05 7 "ERROR: Segment overlap must be less than segment length" 

char disease_name[KEYBOARD_BLTFER_SIZE]; 

char inputJile_name[KEYBOARD_BUFFER_SIZE]; 

char output_/ile_name[KEYBOARD_B U FF ER_S IZE] ; 

int numjgenes = 0; 

int num_segments = 0; 

int len_segment; 

int segment_overtap; 

P GENE first jgene = NULL; 

P_GENE_SEGMENT first_segment = NULL; 

P_GENE_SEGMENT * scrambled_segments = NULL; 

/"Mainline"/ 

void main() { 

int error = E_NOERROR; 

printfTScramble - Version %s. %s\n\n\VERSION JrfO.VERSIONJDATE); 

/• Initial processing 7 
if(lerror) 

error = prolog(); 

r Get various program parameters from user 7 
ff(lerror) 

error = get_parametersO; 

r Load genes from genes ffle 7 
if ((error) 

error = toad_genes(); 

r Add 'AA' to start and end of all genes 7 
if(!error) 

error = add_aa(); 

/* Split genes into overlapping chunks */ 
if(lerror) 

error = spM_genes(); 

/* Convert segment amino acid to dna 7 
if(lerror) 

error = convert_segments_aa_to_dna(); 
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r Scramble the segments V 98/216 
if(lerror) 

err r = perform_scramble(); 

r Write output file V 
if (terror) 

error = write_putput_file(); 

r Show error If there was one V 
If (error) 

printfC%s\n , ' t en r or_text[enror]); 



TprologO*/ 

/* Perform any initial processing required V 



int prologQ { 



} 



/* Seed the random number generator, using the system dock */ 
r Donl run the program more than once in the same second! V 
r Or well get the same randomisation!!!!!!!!!!!!!!!!!!!!! V 
srand(time(NULL)); 

return EJMOERROR; 



P get_parameters() */ 

r Ask for various parameters from the user (stdin) 7 
P Disease name V 
r Input fOe name 7 
/• Output file name 7 
r Segment length 7 

int get_parameters() { 
int valid; 

read_stifEnter disease name : ",disease_name); 
read_strfEnter input file name : ^input_file_name); 
read_str(*Enter output fife name : ",outptrt_fite_name); 

valid = FALSE; 
while (lvalid){ 

len_segment = readjntfEnter segment length : "); 

if (len_segment % 2) 

printffSegment length must be even!\n"); 



} 

r loadjgenes() 7 



valid = TRUE; 

} 

segment_overiap = len_segment / 2; 
return EJMOERROR; 
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r Load the genes from th input fil 7 99/216 

int load_genes() { 

FILE # inputjile; 

char nameJ>ufIBUFFER SIZEJ; 

char data_buflBUFFERj5IZE]; 

intrc; 

I* Open genes file for reading V 
if (NULL == (inputjile - fopen(input_file_name,V))) 
return E_NOINFILE; 

printffLoading genes from: %s\n" > input_ffle_name); 

numjgenes = 0; 

r Read gene name 7 

while (NULL != read_nonblankJine(name_buf,BUFFER_SIZE t input_file)) { 
r Read the gene data 7 

if (NULL != read_nonblankJine(data_buf,BUFFER_SIZE l inpuLfne)) { 
r Allocate memory for new gene and add to list */ 
if (re = add_gene(name_buf,data_buf)) 
break; 

} 

} 

r Close genes file 7 
fdose<»nputJfile); 

return rc; 

) 

T add jgene() 7 

r Allocate memory for new gene, then insert in list 7 

int addjgene(char * gene_name f char * gene_data) { 
P_GENE newjgene; 

r Allocate storage for new gene 7 

if (NULL = (newjgene = malloc(sizeof(GENE)))) 

return EJUALLOC; 
r Initialise new gene 7 
newjgene->next_gene = NULL; 
r Allocate storage for gene name (+1 for null) 7 
if (NULL = (newjgene->name = malk>c(strten(gene_name)+1))) 

return E_MALLOC; 
f Store gene name 7 
strcpy(newjgene->name,gene_name); 
r Allocate storage for gene date (+1 for null) 7 
if (NULL == (new_gene->data = malloc(strlen(gene data)+1))) 

return EMALLOC; 
r Store gene data 7 
strcpy(newjgene->data f gene_data); 
r Insert the new gene into linked list 7 
insert jjene(&first_gene,newjgene); 
r Increment numjgenes 7 
num_genes++; 
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return E_NOERROR; 100/216 

} 

/* insert jgene{) 7 

r Insert gene into linked list */ 

void insert_gene(P_GENE * head_gene,P_GENE new_gene) { 
P_GENE * curj>tr = headjgene; 

while (NULL != (*cur j)tr)) 

cur j)tr = &(( # curj)tr)->nextjgene); 

*cur__ptr = newjgene; 

) 

/*add_aa()7 

F Add 'AA' to the start and end of every gene */ 

int add_aa0 { 

P_GENE cur jgene = first_jgene; 
char * new_data; 

while (NULL != cur_gene) ( 

/* Allocate storage to fit the gene plus four characters 7 

new_data = maJk>c(str1en(cur_gene->data)+5); 

r Shift gene data to new storage, add "AA" 7 

strcpy(new_data,"AA"); 

strcat(new_data,curjgene->data); 

strcat(new_data,*AA*); 

r Free previous gene data storage 7 

free(curjgene->data); 

r Set gene data pointer to new storage 7 

curjgene->data = newjJata; 

I* Advance to next gene 7 

curjgene = cur_gene->next_gene; 

} 

return EJYOERROR; 

} 

/•split _genes()7 

r Split the genes into overlapping segments 7 

Int split _genes() { 

P_GENE curjgene = firstjgene; 
PJ3ENE_SEGMENT cur_seg = first_segment; 



printfCSplrtting genes into segments...^*); 

F Split the genes into segments 7 

while (NULL 1= cur jgene) { 
r Split the gene 7 
split_gene(cur_gene); 
/•Advance to next gene 7 
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curjgene = curjgene->nextjgene; 



r Count the number of segments 7 
num_segments = 0; 
cur_seg = first_segment; 
whiie (NULL != cur_seg) { 

num_segments++; 

cur_seg = cur_seg->next_seg; 



return E_NOERROR; 



P splitjgeneO */ 

/* Split a gene into overlapping segments 7 

int splrt__gene(P_GENE g) { 
char * seg_ptn 
char * segjt>uf; 

P_GENE_SEGMENT new_segment = NULL; 

int done; 

int seg_ctr = 0; 

T Allocate memory for segment buffer 7 
if (NULL — (segjwf = malloc(len_segment+1 ))) 
return E_MALLOC; 

r Insert a null at the end of the segment buffer, 7 
f so we can use it as a string 7 
segjxjfflen_segment] = 'VO'; 

r Set segment pointer to start of gene data 7 
seg_ptr = g->data; 

done = FALSE; 
whfle(!(done))( 

r So we know if we copied data 7 

segjxjftf)] = W; 

r Copy a segment of gene data to the segment buffer 7 
memcpy(seg_buf,seg_ptr f len_segment); 

r If there was some gene data copied to the buffer 7 
if(NULLI=seg_bufI0]){ 

I" Allocate storage for a new segment 7 

if (NULL == (new_segment = malloc(sizeof(GENE_SEGMENT)))) 

return E_MALLOC; 
r Increment segment counter 7 
seg__ctr++; 

r Setup the new segment 7 
new_segment->pjgene =g; 
new_segment->number = seg_ctr, 
new_segment->offset = seg_ptr - g->data + 1; i 
new_segm nt->next_seg = NULL; 



Figure 25 (Cont) 



WO 01/090197 PCT/AU01/00622 



102/216 



if (NULL == (new_segment->amino_data = mailoc(len_segment+1 ))) 

return E_MALLOC; 
if (NULL ~ (new_segment->dna_data = malloc(len_segment*3+1))) 

return E_MALLOC; 
new_segment->amino_data[0] = W; 
new_segment->dna_data[0] = V)*; 
r Copy segment data from buffer to new segment 7 
strcpy(new_segment->amino_data,seg_buf); 
r Insert new segment into chain from gene 7 
insert_segment(&first_segment t new_segment); 



/* If we didnl read a full segment, we are finished! 7 
if (strten(seg_buf) < len_segment) 
done - TRUE; 

r Otherwise, advance segment poster to next segment in buffer 7 
else 

seg_ptr = segj>tr + len_segment - segment_overiap; 



r insert_segment() 7 

f Insert a segment node at the end of the list 7 

int insert_segment(P_GENE_SEGMENT * head_seg,P_GENE_SEGMENT new_seg) { 
P_GENE_SEGMENT * cur_ptr = head_seg; 

while (NULL 1= (*cur j>tr)) 

cur j>tr = &((*curj>tr)->nextjseg); 

•curjrtr = new_seg; 

} 

r convert_segments_aa_to_dna 7 

r Go thru segments, and for each, convert amino acids to dna 7 

int convert_segments_aa_to_dna() { 

P_GENE_SEGMENT curseg = firsLsegment; 
int first_choice = 1; 
int alternate; 

printff Converting to DNA...W); 

r Work out if we need to alternate the first codon choice or not 7 
r Dont need to' do this anymore, since the segment length is 7 
r forced to be even, and the overlap is half the length (odd). 7 
/•alternate = ((even(Iensegment) && even(segment_overiap)) 

|| (!even(len_segment) && !even(segment_overlap)));7 

alternate = FALSE; 

while (NULL!= cur_seg) { 

cur_seg->first_codon_choice = first_choice; 
convert_aaJo_dna(cur_seg->amino - data,cur_seg->dna_data t 

cur_sd^->rirst_codon_choice); 
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r Address next segment 7 
cur_seg = cur_seg->next_seg; 

r If we are alternating, alternate the first codon choice 7 
rif (alternate) 

if (1 = first^choice) 

first_choice = 2; 

else 

^ first_choice = 1;7 

return E_NOERROR; 

} 

1* convert aa_Jo_dna */ 

r Converts a string of amino acid to dna 7 

r NOTE: assumes that buffer at dna _ptr is large enough to hold dnalll 7 

int convert_aaJo__dna(char * aa _ptr,char * dna_ptr,int firstchoice) { 
char * pjcodon; 
int cur ^preferred = first_choice; 

while (WJ= *aa_ptr){ 

p_codon = codon(*aa^tr,curj)referred); 

strcattdna^tr.p^codon); 

r If we didnt find a codon. log a warning 7 

if (0 == strcmp(p_codon/7??\0")) 

printf("WARNING: no codon found for amino acidlVO; 

T Alternate current preferred codon 7 
if (1 = cur_pref erred) 

cur_preferred = 2; 



cur preferred = 1; 

aa_ptr++; 



return EJYOERROR; 

r codon 7 

r Returns a pointer to a codon corresponding to the amino acid passed 7 
r The codon pointer is to 3 characters, plus a terminating null 7 

char * codon(char acid_char,int preferred) { 
int codon Jablejndex; 
char * codon j)tr, 



r Determine index into codon table (table starts at TV) 7 
codon_table_index = atid_char - TV; 

f Set pointer to appropriate codon 7 

codon _ptr = codon_tabte[codon_table_index][preferred-1]; i 
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return codon _ptn 

} 

/* displayjgenes() 7 

r Display the name and data for all genes 7 

int displayjgenes() { 

P_GENE curjjene = first_gene; 

while (NULL != cur _gene) { 

printfC%s\n",curjgene->name); 
printff%s\n" f curjgene->data); 
curjjene = cur_jjene->nextjgene; 

> 

return E^NOERROR; 

} 

r perform_scramb'e() 7 
r Scramble the segments 7 

r Check for adjacent segments, tf there are, rescramble 7 

int pefform_scramble() { 

int done = FALSE; 
int re = E_NOERROR; 

while (TRUE) { 

rc = scramble_segments(); 
if(E_NOERROR = rc) 

if (adjacent_segments()) { 

printff Adjacent segments detected! Rescramble? (y/n) "); 
if(luser_confrmation()){ 

printfCWARNING: Adjacent segments in output 

ffle.\n"); 

break; 

} 

} 




return rc; 

} 

r scramble_segments() 7 

/* Randomly scramble the segments, putting pointers in scram bled_segmentsQ 7 

int scramble__segments() { 

P_GENE_SEGMENT cur_seg = first_segment 
int i j; 

P_GENE_SEGMENT temp; 

•t 

printffScrambling segments...\n"); 
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r Allocate storage for array of segment pointers 7 

if (NULL == (scrambled_segments = mal!oc(sizeof(P_GENE_SEGMENT) # num_segments))) 
return E_MALLOC; 

F First, initialise scrambled_segments in same order as linked list 7 
i = 0; 

while (cur_seg != NULL) { 

scrambled_segments[i] = cur_seg; 
cur_seg = cui_seg->next_seg; 

} 

F Now, randomly scramble the segments 7 
for (i=0;i<num_segments;i++) { 

j = randO % num_segments; 

temp = scrambled_segments[Q; 

scrambled_segments[i] = scrambled_segmentsO]; 

scramWed_segmentsO] = temp; 

} 

return E_NOERROR; 

} 

/* adjacent_segments() V 

r Determine if the scrambled segment order has resulted in 7 
P two segments which were adjacent originally (ie every 7 
r second one) have ended up adjacent. 7 

int adjacent_segments() { 
int i; 

int rc = 0; 

P_GENE_SEGMENT cur_seg; 
P_GENE_SEGMENT next_seg; 

for (i=0;i<num_segments-1 ;Fm») { 

r Address current and next segments 7 

cur_seg = scramWed_segments[i]; 

nextseg = scrambled_segmentsp+1]; 

r Do segments come from same gene, and are two apart? 7 

if (((cur_seg->p_gene == next_seg->p_gene) 

&& ((cur_seg->number ~ (next_seg->number)+2) 

II (cur_seg->number = (next_seg->number)-2)))) 

return 1: 

) 

return 0; 

} 

r write_output_file() 7 

1* Write out segments (in initial non-scrambled order) 7 
/* Write out synthetic protein (in scrambled order) 7 
r Write out synthetic dna On scrambled order) 7 

•i 

intwrite_outputJiJe() { 

FILE'outputJil ; 
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char * amino_buffer t 

P GENE_SEGMENT cur_seg; 

inTi; 

r Open output file for writing (erase any contents) V 
if (NULL = (oiitput file = fopen(output_file_name,'V))) 
return E_CREATE_OUTPUT_FILE; 

r Allocate memory for padded amino string buffer */ 
if (NULL = (amino_b uffer = malloc(len_segment*3+1))) 
reium tjviALLuC; 

printf("Writing output file: %s\n",outputJfle_name); 

I* Write output file header information */ 
fprintffoutputJile.-Scramble %s - Output File\n\VERSION_NO); 
f|>rintf(output_fiJe, , Vr); 

fprintf(outpuCffle "Disease name : %s\n",disease_name); 
fprintf(outjxitJilef Input filename : %s\n*,input_file_name); 
fprintf(output_fae/*Output fiename : %s\n",output_file_name); 
fprintf(output JiJe/Number genes : %d\n" p numjgenes); 
fprintf(output_fife,"Number segments : %d\n",num_segments); 
fprintf (output JOe/'Segment length : %d\n\len_segment); 
fprintf(outputJite,"Segment overlap : %d\n",segment_overtap); 

r Write out segments in initial non-scrambled order */ 
fprintf^utpuLfite.-Nn'); 

fprintf(output Jfle."Segments in original orderW); 

fJ)rintf(output_ffle" W); 

cur_seg = first_segment; 
while (NULL 1= cur_seg) { 

r Format amino data to line up with codons 7 

pad_amino_string(cur_seg->am ino_data.amino_buff er); 

fprintf(output_fBe "Gene : %s\n\cur_seg->pjgene->name); 

fprintf(outputjae,"Segment# : %d\n",cur_seg->number); 

fprintf(output_ffle "Offset : %d\n" l cur_seg->offset); 

fprintf(output_file,"1st Codon : %d\n",cur__seg->first_codon_choice); 

fprintf(output_fBe,"%s\n",amino_buffer); 

^rintf((HJtputjfBe,*%s\n",cur_seg->dna_data); 

ft>rintf(outputJile,"\n"); 

cur_seg = cur_seg->next_seg; 

} 

r Write out segment names in scrambled order 7 
fprintf(output_file, ,, Segments in scrambled orderW); 

fprintf(outputjae," W); 

for (i=0;i<num_segments;i++) { 

F Format amino data to line up with codons •/ 

pad_amino_string(sCTambled_segments[i]->amino_data,amino_buffer); 
r Write segment details 7 

fprintf(outputjRle f "%s #%d\n",scrambled_segmentstO->pjgene->name > 

scrambled_segments[i}->nurnber); 
fprintf(outputjfile,"%s\n # f amino_buffer); >\ 
fprintf(output_fil ,"%s\n".scrambled_segments[0->dna_data); 
fprintf(output_file,"\n w ); 



Figure 25 (Cont) 



WO 01/090197 PCT/AU01/00622 



107/216 



} 

r Write synthetic protein in one long string 7 
fprintf(output_file.'Synthetic ProteinrVi"); 

fprintf(outputJile f " Vn"); 

for fi=0;i<num_segments;i++) 

fprintffoutpuLffle^s'^crambled^segments^am 

fprintf(outpuLfile, , Vi\n"); 

/* Write synthetic dna in one long string 7 
fprintffoutputJile/Synmetic DNA:\n"); 

fprintf(outputJHe," Vf); 

for 0=O;i<num_segments;K+) 

fprintf(outpuLfae,^*,scrambled_segrnents[i]->dna__data); 

return E_NOERROR; 

} 

r strip_newlineO 7 

/* Replace the first newline character with a null 7 

void strip_ne\tfine(char * strip_str) { 
char * newlfne_j>os; 

r Find the newline char 7 
newline jpos = strchr(strip_str.Vi'); 

f If we found one, replace it with a null 7 
if (NULL 1= newline _pos) 

newline_pos[0] = W; 

} 

r pad_amjno_string 7 

P Copy amino chars from amino _ptr to padded_ptr, padding each 7 
P side with a space. 7 

void pad_amino_string(char * amino j)tr, char * padded_ptr) { 

while 00' 1= *aminqj>tr) { 
♦padded Lptr = "; 
padded_ptr++; 
# padded_ptr = *amino_ptr, 
padded_ptr++; 
•padded _ptr = "; 
padded_ptr++; 
ammo_ptr++; 

} 

/* Stick a null at the end of the padded string 7 
•padded_j>tr = ■>£>•; 

} 

r even() 7 i 
r True if test_num is even, otherwise false 7 
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int even(int test_num) { 

return !(test_num % 2); 

} 

r readJntQ 7 

r Read an integer from stdin. Keep trying until valid int > 0 entered. 7 
r Return the integer read, or 0 if error reading from stdin. 7 

int read_jnt(char * prompt) { 

char buffer[KEYBOARD_BUFFER_SIZE]; 
int value__read; 
int valid = FALSE; 

while (!valid){ 

printf("%s",prompt); 
valid = TRUE; 

fgets(buffer t KEYBOARD_BUFFER_SIZE.stdin); 
if (1 != sscanf(buffer,"%d",&value_read)) 

valid = FALSE; 
if (valid && (value_read < 1)) 

valid = FALSE; 

rf(tvalid) 

printffPositive integer value pleaseDn"); 



return va!ue_read; 

} 

t read_str() 7 

r Read a string from the user (stdin) 7 
r Strip the newline from it 7 

void read_str(char * prompt.char * string) { 

char buffer[KEYBOARD_BUFFER_SIZE]; 

printf (prompt); 

fgets(buffer 1 KEYBOARD_BUFFER_SIZE,stdin); 
sscanf(buffer,"%s".string); 



r read_nonblank_line() 7 

r Read a line from file until we get a non-blank one 7 

char # read_nonblank_line(char # buf.int buf_size,FILE # injile) { 
char* return _ptr; 

r Read lines until we get a non-black one, or EOF 7 
do 

return _ptr = fgets(buf,buf_size t in_file); 
whfle ((NULL 1= retumjptr) && (f\n' = buf[0J) || f ■ = buf[0]))); 

r If we got a line, change the newline char to a null 7 
if (NULL 1= return _ptr) 

strip_newline(buf); 
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return return _ptr; 

} 

r user_confirmatlon() */ 

r Read input from user. If user types y, return 1, otherwise 0 7 

int user_confirmation() { 

char buffer{KEYBOARD_BUFFERSIZE]; 

fgets(buffer,KEYBOARD_BUFFER_SIZE.stdin); 
if ((Y == bufferlOJ) || (V == buffer[0]» 
return 1; 

else 

return 0; 

} 

/Mest()7 

r For debugging/development 7 

void test() { 

char strflOO]; 

printffTEnter something: *); 

fgets(str.10O,stdin); 

printffline1\n"); 

printff%s - ,str); 

printfHine2\n"); 

fgets(str,100,stdin); 

} 
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HepC Savine d sign 



HepC la consensus polyprotein sequence used for scramble program 

MSTNPKPQRKTKRirrNRRPQDVKFP^ I PKARRPEGRTWAQ 

PGYPWPL YGNEGCGWAGWLLS PRGSRPSWGPTDPRRRSRNI^ OTI DTLTCGFADLMG Y I PLVGAPLGGAARAIiAHGVR 
VLRDGVNYATGNLPGCSFS I FLLALLSCLTVPASAYQVRNSTGLYHVTNDCPNSS I VYEAADAI LHTPGCVPCVREGN 
ASRCTWAMTPTWmUX3KLPATQIJ*RHIDLLVG 

I TGHRMAWDMMMNWS PTAALVMAQLLR I PQAI LDMI AGAHWGVLAG I A Y FSMVGNW AKVL VVLL»L FAGVD AETHVTGG 
NAGRTTSGLVSLLTPGAKQNIQLINTOGSWHINSTAIJ^^ 
WGPISYANGSGPDQRPYCWHYPPKPCGIVPAKSVCTC^ 
GNWFGCTWMNSTGFTKVCGAPPCVIGGAGNNTIJICPT^ 

T I FKVRM YVGGVBHRLEAACNWTRGERCDLEDRDRSBLS PLLLSTTQWQVLPCS FTTLPALSTGLIHLHQNIVDVQYL 

YGVGSS IASWAI KWBYVVLLFT^LADARVCSCLWMM^ 

RWVPGAVYALYGMWPTiLLIjLIiALPQRAYAIjyrEVAASCXjGVVLVG 

VWVPPLl^niGGRDAVILI>!CVVHPTLVFDITKLLLAVro 

AIIKLGALTGTYVYNHLTPUUWAHNGIJ^ 

ADGMVSKGWRLLAPITAYAQQTRGLLGCI I TSLTGRDKNQVEGHVQ I VSTAAQTFLATC INGVCWTVYHGAGTRTIAS 
PKGPVIQMYTOVDQDLVGWPAPQGSRSLTPCTCX3SSDLYLVTRHADVI PVRRRGDSRGSLLSPRPI SYLKGSSGGPLL 
CPAGHAVG I FRAAVCTRGVAKAVDF I PVENIiETTMRS PVFTDNS S PPAVPQS FQVAHLHAPTGSGKSTKVPAAYAAQG 
YIOHjVLNPSVAATICFGAYMSKAHGIDPNIRTGVRTITTGSPITO 
LGIGTVLDQAETAGARIiVVIATATPPGSVTVPHPNIBBVA^ 

AKLVALGINAVAYYRGLDVSVI PTSGDWWATDALMTGYTGDFDS VI DCNTCVTQTVDFSIjDPTFTIBTTTLPQDAV 

srtqrrgrtgrgkpgiyrfvapgbrpsgmfdssvlcecydagc^wyeltpab^^ 

VFTGLTHI DAHFLS QTKQSGEOTP YTjVAYQATVCARAQAPPPS WIXJMWKCXIRL PTPLL YRLGAVQKEVTLT 

HPVTKYIMTCMSADLBVVTSTWVLVGGVLAAIJUVYCX^ 

YIBQGMMIJVEQFKQKAIjGLLQTASRQAEVIAPAVCJTNWQKLEVFW 

AAVTSPIjTTSQTIiLFNI IX3GWVAAQIJUVPGAATAFVGAGLAGAAJGSV 

VPSTBDLVNLLPAILSPGALVVGWCAAIIJIRHVGPGBGAVQW^ 

LTVTQLLRRLHQWISSBCITPCSGS^ 

CTOSAEITGHVTCNGTMRIVGPRTCRNMWSGT^ 

TDNLKCPC^vTSPEPFTBIJX5VTU*HRFAPPCOT 

AAGRRIARGSPPSMASSSASQLSAPSIJCATCTANHDSPEABLIEANI^WRQEM 

BDBRBISVPABIIiRKSRRFAQALPVWARPDYKPPLVETWKKPDYBPPVVHGCPLPPP 

STAIAELATKSFGSSSTSGITGDNTTTSSBPAPSGC^^ 

CCSMSYSWTGALVTPCAAEEQKIjPINAIjSNSLLRHHNLVYSTT^ 

KVKANLLSVBEACSLTPPHSAKSKFGYGAKDVRCHARKAVAHINSVWKDL^ 

KPARLIWPDIXSVRVCTKMALYDVVSKLPIAVMGSSYGFQYSPGQRVBFLVQAWK^ 

IRTEEAIYQCCDIJDPQARVAIKSLTERIjYvTXSP^ 

CTMLVCXn>DLVVICBSAGVQBDAASLRAFTBAMTRYSAPPGDPPQPBYDLBLI 
TTPLARAAWBTAIUrrPVNSWLGNIIMFAPTLWARMIIJfrH^ 
HGLSA^SLHSYSPGEINRVAACLRICLGVPPL^ 
LDI^GWFTAGYSGGDIYHSVSHARPRWFWFCUiLIAAGVGIYLLPra 



Scraable - Output Pile 

Scramble version : 0.1 beta, 08/02/1999 
Nun. genes : 1 

Hum. segments : 201 
Segnent length : 30 
Segment overlap : 15 

Segments in original order: 



Gene : HepCla 

Segment* i 1 
Offset : 1 
1st Codon : 1 

AAMSTNPKPQRKTKRHTNRRPQDVKFPGGG 
GCCGCTATGTCCACCAATCCC^AACCCCAAA^ 
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Gene : HepCla 

Segment* : 2 
Offset : 16 
1st Cod on : 1 

NTNRRPQDVKPPGGGQXVGGVYLLPRRGPR 
AACACAAACAGAAGGCCTCAGGATGTGAAATTCCCTGGC^^ 



: HepCla 
Segment* : 3 
Offset t 31 
1st Codoo : 1 

QIVGGVYLLPRRGPRLGVRATRKTSBRSQP 
CAGATTGTGGGAGGCGTCTACCTCCTGCCTAG 



Gene : HepCla 

Segment* : 4 
Offset i 46 
1st Codon : 1 

LGVRATRKTSBRSQPRGRRQ 
CTGGGAGTGAGAGCCAC 



PI PKARRPKG 



Gene : HepCla 

Segment * : 5 
Offset : 61 
lat Codon : 1 

RGRRQPI PKARR P8GRTWAQ PGY P H PLYGN 
AGGGGAAGGAGACAGCCTATCCCTAAGGCT^ 



Gene : HepCla 

Segment* : 6 
Offset : 76 
1st Codon : 1 

RTWAQPGYPWPLYGNBGCGNAGNLLSPRGS 
AGGACATOQGCniAGCCTGGCTATCCCTGGCCCCTC TA 



Gene : HepCla 

Segment* : 7 
Offset : 91 
1st Codon : 1 

BGCGWAGNLLS PRGSRPSWG PTD PRRRSRN 



Gene . : HepCla 
Segment* : 8 
Offset : 106 
1st Codon : 1 

RPSKGPTDPRRRSRN LGKVI 0TLTCGPADL 
AGGCCTAGCTGGGGCCCTACCGATCCCAGAAGGAGAA 



Gene : HepCla 

Segment* : 9 
Offset : 121 
1st Codon : 1 

LGKVIDTLTCGPADLMGYI P r V G A P L G G A A 
CTGGGAAAGGTCATCGATACCCTCACCTGTGGCTTTGC^ 



Gene : HepCla 

Segment* : 10 
Offset : 136 
1st Codon : 1 

MGYIPLVGAPLGGAARALAHGVRVLBDGVN 
ATGGGATACATTCCCCTCGTGGGAGCCCCTCr^ 



Gene : HepCla 

Segment* : 11 
Offset : 151 
1st Codon : 1 

RALAHGVRVLBDGVNYATGNLPGCSPSI PL 
AGGGCTCTGGCTCACGGAGTGAGAGTGCTCGAGGA 



Gene : HepCla 

Segment* : 12 
Offset : 166 
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1st Cod on : 1 

YATGNLPGCSFSIFLLALLSCLTVPASAYQ 
TACGCTAOCGGAAACCTCCCCGGATGCTCXTTCTCCATC^ 



Gene : HepCla 

Segment! : 13 
Offset : 181 
1st Codon : 1 

LALLSCLTVPASAYQVRNSTGLYHVTNDCP 
CTGGCTCTGCTCAGCTGTCTGACAGTGCCTC 

Gene : HepCla 

Segment! s 14 
Offset : 196 

1st Codon : 1 

VRHSTGLYHVTNDCPNSS IVYBAADAILHT 
GTGAGAAACTCCACCGGACTGTATCACGTOICCAATGA 

Gene : HepCla 

Segment! : 15 
Offset : 211 
1st Codon : 1 

NSSIVYBAADAILHTPGC 
AACTCCAGCATTGTGTATGAGGCTGCCGATGCC^ 



Gene : HepCla 

Segment! s 16 
Offset : 226 
1st Codon : 1 

PGCVPCVRBGNASRCWVAMTPTVATRDGKL 
CX^CGGATGCGTCCCCTGTtnGAGAGAGGGAAACG 



Gene : HepCla 

Segment! t 17 
Offset : 241 
1st Codon : 1 

WVAMTPTVATRDGKLPATQLRRH I DLLVG 
TGGGTCGCCATGACCCCTALXVrCUCtaCAAGGGAT 



Gene : HepCla 

Segment! : 16 
Offset : 2S6 
1st Codon : 1 

PATQLRRHIDLLVGSATLCSALYVGDLCGS 

Gene t HepCla 

Segment! : 19 
Offset : 271 
1st Codon : 1 

ATLCSALYVODLCGSVFLVGQLFTFSPRRH 

Gene : HepCla 

Segment! : 20 
Offset : 2B6 
1st Codon : 1 

VFLVGQLFTFS PRRHWTTQGCNCS IYPGH I 

GAGACACTGGACCACACAGGGATGCAATTGCTCCATCTATCCCGGA^ 



Gene : HepCla 

Segment! i 21 
Offset s 301 
1st Codon : 1 

WTTQGCHCSIYPGHITGHRMAWDMMMNWSP 
TGGACAACCCAAGGCTGTAACTGTAGCATTTACCCTGGCXAT^ 



Gene : HepCla 

Segment! : 22 
Offset : 316 
1st Codon : 1 

TGHRMAWDMMMMWS PTAALVMAQLLRI P 0 A 
ACCGGACACAGAATGGCTTGGGATATGATGATGAATTGGTTC 
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Gene x HepCla 

Segment # : 23 
Offset : 331 * 
1st Codon : 1 

TAALVMAQLLR I PQAI LDMIAGAHWGVLAG 

ACCGCTGCCCTCCTGAraXXXA^ 

Gene : HepCla 

Segment* : 24 
Offset t 346 
1st Codon : 1 

ILDMIAGAHWGVLAGIAYFSMVGNWAKVLV 

ATCCTCGACATGATCGCTGGCGCTCACTGGGGCGTCC^ 

Gene : HepCla 

Segment* : 25 
Offset : 361 
1st Codon : 1 

IAYPSMVGHWAKVLVVLLLPAGVDABTHVT 
ATCGCTTACTTTAGCATGGTGGGAAACTGGGCCAAAGTG 

Gene : HepCla 

Segment* : 26 
Offset : 376 
1st Codon : 1 

VLLLPAGVDABTHVTGGHAGRTTSGLVSLL 

Gene : HepCla 

Segment* i 27 
Offset : 391 
1st Codon : 1 

GGNAGRTTSGLVS LLTPGAKQNZ QLINTNG 
GGOGGAAACGCTGGCAGAACCACAAGCGGACTGGTCAGO^ 

Gene : HepCla 

Segment* : 28 
Offset : 406 
1st Codon : 1 

TPGAKQHIQLI NTNGSWHIMSTALNCNBSL 
ACCCCTGGCCXrTAAGCAAAACATTCAGCTCATaU^ 

Gene : HepCla 

Segment* : 29 
Offset : 421 
1st Codon : 1 

SNH INSTALMCNB SLNTGtfLAGLPYQHKFN 
AGCTGGCACATTAACTCCACCGCrCrGAArTGCAATGAGTCCCTGA^ 

Gene : HepCla 

Segment* : 30 
Offset : 436 
1st Codon : 1 

MTGHLAGLPYQHKFNSSGCPBRLASCRRLT 

Gene : HepCla 

Segment* : 31 
Offset : 451 
1st Codon : 1 

SSGCPBRLASCRRI*TDPDQGWGP ISYANGS 
AGCTCCGGCTGTCCCGAAAGGCTCGCCTCCTGCAGAAGGCTCACCGATTTC^ 

Gene : HepCla 

Segment* : 32 
Offset : 466 
1st Codon : 1 

DPDQGWGPI SYANGSGPDQRPYCWHYP PKP 
GACTITGACCAAGGCTGGGGCCCTATCTCCrACGCTA 

Gene : HepCla 

Segment* : 33 
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Offset : 481 
1st Codon : 1 

OPDQRPYCWHYPPKPCGIVPAKSVCGPVYC 
CGCCCTGACCAAACGCCTrACTGTTGGCAT^ 



Segments : 34 
Offset : 496 
1st Codon : 1 

CGIVPAKSVCGPVYCFTPSPVVVGTTDRSG 
TCCGGAATCGTCXXCGCTAACmXXrityi^ 

Gene : BepCla 

Segnentft : 35 
Offset : 511 
1st Codon : 1 

F T — -- - V - V VGTTDRSGAPTYSW, *ANDTDVPV 
TTCACACCCTOXCCGTCGTCGTCGGCAC^ 

Gene : HepCla 

Segment # : 36 
Offset : 526 
1st Codon : 1 

A P T Y S W GANDTDVPVLMMTRPPLGNWPGCT 
CCCarrACCTATAGCTGGGGCCCTAAC^ 

Gene : HepCla 

Segments : 37 
Offset : 541 
1st Codon : 1 

LNHTRPPLGNH 
CTGAAT 

Gene : HepCla 

Segment* : 38 
Offset : S56 
1st Codon : 1 

WMHSTOPTKVCGAPPCVIGGAGMMTLHCPT 

TGGATGAACTCCACCGGATTCACAAAGGTCTGCGCAGC^ 

Gene : HepCla 

Segments : 39 
Offset : 571 
1st Codon : 1 

C V I G GAGHNTLHCPTDCPRKHPBATYSRCG 

TCCCTCATCGGAQGCGCTGGCAATAACACACTGCATT^ 

Gene : HepCla 

Segment! : 40 
Offset : 586 
1st Codon : 1 

P C P RKHPBATYSRCGSGPWITPRCLVDYPY 
GACTGTTTCAGAAAGCATCCCGAAGCOfcCATAC 

Gene : HepCla 

Segments : 41 
Offset : 601 
1st Codon : 1 

SGPWITPRC L V DYPYRLWHYPCTINYTIPK 

AGCGGACCCIGGATCACACCCAGATGCCTCG 

Gene : HepCla 

Segments : 42 
Offset : 616 
1st Codon : 1 

R L N H Y P C T I N Y T I P KVRMYVGGVBHR LBAA 

AGGCTCTGGCATTACCCTTGCACAATCAATTACACA^ 

Gene : HepCla 

Segments : 43 
Offset : 631 
1st Codon : 1 

VRMYVGGVBHRLBAACNMTRGER CDLBDRD 
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GTGAGAATGTATGTKXXSAGGCGTCGAGCATAGGCTCG^ 

Gene i HepCla 

Segment* : 44 
Offset : 646 
1st Codon : 1 

CNWTRGBRCDLEDRDRSELS PLLLSTTQWQ 
TGCAATTGGACftAGGGGJVGAGAGATC£GATC 

Gene : HepCla 

Segment* : 45 
Offset : 661 
1st Codon : 1 

RSBLS PLLLSTTQWQVLPCS FTTL P A L S T G 
AGGT 

Gene : HepCla 

Segment* : 46 
Offset : 676 
1st Codon : 1 

VX*PCS PTTLP ALS TGLIHLHQN IVDVQYLY 
GTGCTCCCCTTnAGCTTTACCACACTGCCTG 

Gene i HepCla 

Segment* : 47 
Offset : 691 
1st Codon : 1 

LIHLHQNIVDVQYLYGVGSS IASWAIKWEY 
CTGATTCACCTCCACCAAAACATTGTGGATCrr^^ 

Gene : HepCla 

Segment* : 48 
Offset : 706 
1st Codon : 1 

GVGSS IASWAIKWBYVVLLPLLLADARVCS 
GOCXnraXXTICCAGCATTGCCTCCTGGGCTATCAAAI^ 

Gene : HepCla 

Segment* : 49 
Offset : 721 
1st Codon : 1 

VVLLP LLLADARVCSCLWMMLLISQABAAL 

GTOGTOritxrrcTrcciantx^^ 

Gene : HepCla 

Segment* : 50 
Offset : 736 
1st Codon : 1 

CLWMHLLISQABAALBNLVI L N A A S LAGTH 
TGCCTCIGGATGATGCTCCTGATTAGCCAAGCCGAA 



HepCla 
Segment* : 51 
Offset : 751 
1st Codon : 1 

BMLVI LNAASLAGTHGLVSPLVFPCFAWYL 
GAGAATCTGGTCATCCrCAACGCTGCCTCCCTGGCTGG^ 

Gene t HepCla 

Segment* : 52 
Offset : 766 
1st Codon : 1 

G L -^Y— ? F L V P P C P AWYLKGRWV PGAVYALYGM 
GGCCTCGTGTCCTTCCTCGTGTTTTTCTG^ 

Gene : HepCla 

Segment* : S3 
Offset : 781 
1st Codon : 1 

KGRHVP AVYALYGMWPLLLLLLALPQRAY 

kTGCCCTCTACGGAATGTGGCCCCTCCTGCTCCTGCTC 



Gene : HepCla 
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Segment ft : 54 
Offset : 796 
1st Codon : 1 

WPLLLLLLALPQRAYALDTBVAAS CG6VVL 
TGGCCTCTGCTCCTGCTCCTGCTCGCCCT 

Gene : HepCla 

Segment* : 55 
Offset : 811 
1st Codon : 1 

ALDTBVAASCGGVV LVGLMALTLS P Y Y K R Y 
GCCCTCGACACAGAGGTCGCCGCTAGCTGTGGCGGA^ 

Gene : HepCla 

Segment* : 56 
Offset : 626 
1st Codon : 1 

VGLHALTLSPYYXRYISWCLHWLQYFLTRV 

GTGGGACTGATGGCXXrrCACCCTCAGCC!CTTACrATAA^ 

Gene : HepCla 

Segment* : 57 
Offset : 841 
1st Codon : 1 

ISWCLWWLQYFLTRVBAQLHVWVPPLNVRG 

ATCTCCTGGTGTCTGTCGTGGCTCCAGTATTTC 

Gene : HepCla 

Segment* : 58 
Offset : 856 
1st Codon : 1 

BAQLHVHVPPLNVRGGRDAVILLMCVVHPT 
GAGGCTCAGCTCCACGTCTGQGTCXXXXXrTCTCAA 

Gene : HepCla 

Segment* : 59 
Offset : 871 
1st Codon : 1 

0 R D A V 1 L L " C V V H p TLVPDITKLLLAVPGP 

C<XaGAGACGCl\3T<aTlClt^rait;iGTC 

Gene : HepCla 

Segment* : 60 
Offset : 886 
1st Codon : 1 

LVPDXTKLLLAVFGPLNXLQASLX.KVPYPV 
CUW'lVriU ^ TATCAOUlACCT^ 

Gene : HepCla 

Segment* : 61 
Offset : 901 
1st Codon : 1 

LWILQASIiLKVPYPVRVQGLLR ICALARKM 

Gene : HepCla 

Segment* : 62 
Offset : 916 
1st Codon : 1 

RVQGLLRXCALARKMXGGHYVQMAX I K L G A 
AGGGTCCAGGGACTGCTCAGGArrTGCGCTCTGGCTA 

Gene : HepCla 

Segment* t 63 
Offset : 931 
1st Codon : 1 

X G GHYVQHAX XKLGALTGTYVYNHLTPLRD 
ATCXXAGGCCATTACGTCCAGATGGCCATTATCAW^ 

Gene : HepCla 

Segment* : 64 

ffset : 946 

1st Codon : 1 
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L T G T Y V Y HHLTPLRDWAHNGLRDLAVAVBP 
CTGACAGGCACATACXnXTTACAATCACCTCACrc 

Gene : HepCla 

Segments : 65 
Offset : 961 
lot Codoa : 1 

WAHMGLRDIiAVAVBPVVPSQMBTKLITHGA 
TGGGCTCACAATGGCCrCAGGGATCroGCT^^ 

Gene : HepCla 

Segment* : 66 
Offset : 976 
1st Codon : l 

VVPSQHBTKLITWGADTAACGDI I N G L P V S 

G'AU;fClUVlt:caVGATGGAGACAAAGCTCATCAC^TGGGG 

Gene : HepCla 

Segment* : 67 
Offset : 991 
1st Codon : 1 

DTAACGDI IHGLPVSARRCRE I LLG PAOGN 

Gene : HepCla 

Segment* : 68 
Offset : 1006 
1st Codon : 1 

A R R G R B I 
GCCAGAAGGGGAAGGGAAAT 



: HepCla 
Segment* : 69 
Offset : 1021 
1st Codon : 1 

VSKGWRLLAPITAYAQQTRGLLGCI ITSLT 
GTGTCCAAGGGATGGAGACTGCrCGCCCCTATCACAGCCT 

Gene : HepCla 

Segaentft : 70 
Offset : 1036 
1st Codon i 1 

QQTRG LLGCIITSLTGRDKMQVBGBVQIVS 
CAGCAAACCAGAGGCCTCCTGGGATGCATTATCACAAGC^ 

Gene : HepCla 

Segment* : 71 
Offset : 1051 
1st Codon : 1 

GRDKNQVBGBVQIVSTAAQTPLATCINGVC 
GGCAGAGACAAAAACCAAGTGGAAGGCX3AAGTGCAAATCGT 

Gene : HepCla 

Segment* : 72 
Offset ; 1066 
1st Codon : 1 

T A A Q T P L A TCI HGVCNTVYHGAGTRTIASP 
ACCGCTGCCCAAAOCTTTCTGGCTACCItnATCAATGGCG^ 

Gene : HepCla 

Segment* : 73 
Offset : 1001 
1st Codon ': 1 

II T V Y H G AG TRTXASPKG PV IQMYTNVDQDL 
TGGACAGTGTATCACGGAGCCGGAAOCAGAACCATTGCCI^ 

Gene : HepCla 

Segment* : 74 
Offset : 1096 
1st Codon : l 

K G P V IQMYTHVDQDLVGWPAPQGSRSLTPC 
AAGGGACCCGTCATCCAAATGTATACCAATCTGGATCAGGATCT^ 
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Gene : HepCla 

Segment* : 75 
Offset : 1111 
let Codon : 1 

VGWPAPQGSRSLTPCTCGSSDLYLVTRHAD 
GTGGGATGGCCTGCCCCTCAGGGAAGCAGAAGCCTCAC^ 

Gene : HepCla 

Segment* : 76 
Offset : 1126 
1st Codon : 1 

TCGSSDLYLVTRHADVI PVRRRGDSRG S L L 
ACCTGTGGCTCCAC<X^TCTGTATCTG^ 

Gene : HepCla 

Segment* : 77 
Offset : 1141 
1st Codon : 1 

VXPVRRRGDSRGSLLS PRPISYLKG SS GG P 
GTGATTOCCGTCAGGAGAAGGGGAGACTCCAGGGGAAGCXrr 

Gene : HepCla 

Segment* t 78 
Offset : 1156 
let Codon : 1 

SPRPISYLKGSSGGPLLCPAGHAVG IPRAA 
AGCCCTAGGCCTATCTCCTACCTCAAGGGAAGCT 

Gene : HepCla 

Segment* : 79 
Offset : 1171 
1st Codon : 1 

LLC PAGHAVGI PRAAVCTRGVAKAVDP I P V 

Gene : HepCla 

Segment* : 80 
Offset : 1186 
1st Codon : 1 

VCTRGVAJCAVDFI PVBHLBTTMR 
GTGTGTACCAGAGGCGTCGCCAAAGCCGTCGACTTTATC^ 

Gene : HepCla 

Segment* : 81 
Offset : 1201 
1st Codon : 1 

BMLBTTMRSPVFTDWSSPPAVPQSFQVAHL 
GAGAATCTGGAAACCACAATGAGAAGCCCTGTGTTT^ 

Gene : HepCla 

Segment* : 82 
Offset : 1216 
1st Codon : 1 

SSPPAVPQSPQVAHLHAPTGSGKSTKV PAA 
AGCTCCCCCCtrrGCOGTCCtXr A AAG C ^ 

Gene : HepCla 

Segment* : 83 
Offset : 1231 
1st Codon : 1 

HAPTGSGKSTKVPAAYAAQGYKVLVLNPSV 
CACGCTCCCACAGGCTCCGGCAAAAGCACAAAGGTCCCCGCTG 

Gene : HepCla 

Segment* : 84 
Offset : 1246 
1st Codon : 1 

YAAQGYKVLVLMPSVAATLGFGAYMSKAHG 
TACGCTGCCCAAGGCTATAAGGTCCTGGTCCTGAATCCCI^ 

Gene : HepCla 

Segment* i 85 
Offset : 1261 
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1st Codon : X 

AATLGFGAYMSKAHGIDPNIRTGVRTITTG 
TACATGAGCAAAGCCCATGGCATTGACCCTAACATC^ 



Gene : HepCla 

Segment # : 66 
Offset : 1276 
1st Codon : 1 

IDPNIRTGVRTXTTGSPITYSTYGKPLADG 
ATCGATCCCAATATCAGAACCGGAGTtiAGAACCAT^ 

Gene : HepCla 

Segment* : 87 
Offset : 1291 
1st Codon : 1 

S P ITYSTYGKFLADGGCSGGAYDI I ICDBC 
AGCCCTATCACATACTCCACXTATGGCAAATTCCTCGCXX^TG 

Gene : HepCla 

Segment* : 88 
Offset : 1306 
1st Codon : 1 

GCSGGAYDI IXCDBCHSTDATSI LGZGTVL 
GGCTGTAGCGGAGGCGCTTACX^TATCATTATCrGTG^ 



Gene : HepCla 

Segment* : 89 
Offset : 1321 
1st Codon s 1 

HSTDATS I L G I GTVLDQABTAGARLVVLAT 
CACTCCACCGATGCCACAAGCATTCTGGGAATCGGA^ 



Gene : HepCla 

Segment* : 90 
Offset : 1336 
1st Codon : X 

DQABTAGARLVVLATATPP 



G 8 V T V 



P H P N I B 
ITCCCAATATCGAA 



Gene : HepCla 

Segment* : 91 

Offset : 1351 

1st Codon : 1 

ATP PGSVTVPH PNIBBVAL3TTGS I PPYGK 

Gene : HepCla 

Segment* : 92 
Offset : 1366 
1st Codon : 1 

EVA LSTTGB I P PYGKAI P LBV I K G G R H I* I P 

ATCCVl\Tlt3GAAGTGATTAAGGGAGGCAGACACCTCATl."l , l"l' 



Gene : HepCla 

Segment* : 93 
Offset : 1381 
1st Codon : 1 

AIPLBVI KGGRHLIPCHSKKKCDBLAAKLV 
GCCATTOCCCTCGAGCTCATCAAAGGCGGAA^ 

Gene : HepCla 

Segment* t 94 
Offset ! 1396 
1st Codon t 1 

CHS KKKCDBLAAKLVALG INAVAYYRGLDV 
TGCCATAGCAAAAAGAAATGCGATGAGCTCGCCGCTAAGCTOGTG 



Gene : HepCla 

Segment* : 9S 

ffset : 1411 
1st Codon : 1 

ALGINAVAYYRGLDVSVl PTSGDVVVVATD 
GCCCTCGGCATTAACGCTGTGGCITACrATAGGGGA 
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Gene : HepCla 

Segments : 96 
Offset : 1426 
1st Cod on : 1 

SVIPTSGDVVVVATDALMTGYTGDPDSVID 
AGCGTCATtXXTACCTCCGGCGATGTGGTCGTGGTC^ 

Gene : HepCla 

Segment # i 97 
Offset : 1441 

1st Codon : 1 

ALMTGYTGDPDSVI DCHTCVTQTVDFS LDP 
GCCCTCATGACAGGCTATACCGGAGACTTTGAC^^ 

Gene : HepCla 

Segment # : 58 
Offset : 14S6 
1st Codoa : 1 

CUT C V TQTVDFSLDPTFTIBTTTLPQDAVS 
TGCAATACCTGTGTGACACAGACAGTGGATTTCTCX^C^ 



Gene : HepCla 

Segment^ : 99 
Offset : 1471 
1st Codon : 1 

TFTIKTTTLPQDAVSRTQRRGRTGRGKPGI 
ACCTTTACCATTGAGACAACCACACTGCCTOVGGAT^ 

Gene : HepCla 

Segment i : 100 
Offset : 1486 
1st Codoa s 1 

RTQRRGRTGRGKP G I YRFVAPGKR PSGMFD 

Gene r HepCla 

Segment! : 101 
Offset : 1501 
1st Codon : 1 

YRFVAPG BRPSGHPDSSVLCBCYDAGCAWY 
TACAGATTCGTCGCCCCTGGCGAAAGGCCTAGCGGA AlVlTXt^ 



Gene : HepCla 

Segments : 102 
Offset : 1516 
1st Codon : 1 

SSVLCBCYDAGCAWYSLTPABTTVRLRAYM 
AGCTCCGTGCTCTGCGAATGCTATGACGCTGGCTGTG 



Gene : HepCla 

Segments : 103 
Offset : 1531 
1st Codon : 1 

8LTPABTTVRLRAYHHTPGLPVCQDHLB FN 

GAGCTCACCCCTGCXXaAAACCACAGTGAGACTG^ 



Gene : HepCla 

Segments : 104 
Offset : 1546 
1st Codon : 1 

MTPGLPVCQDHLBFWBGVFTGLTH IDAH PL 

ATCACCTOGA Gn^GG GAAGGCGTCrTCACAGGCCrCACCCATATOGATG 



Gene : HepCla 

Segments : 105 
Offset : 1561 
1st Codon : 1 

BGVFTGLTHIDAHFLSQTKQSGBHPPYLVA 
GAGGGAGTGTTTACCGGACTGACACACATTGACGCTCACCT 

Gene : HepCla 

Segments : 106 
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Offset : 1576 

1st codon : 1 

SQTK.QSGBNFPYLVAYQATVCARAQAP 

Gene : HepCla 

Segment! : 107 
Offset : 1591 
1st Codon : 1 

YQATVCARAQAPPPSWDQMWKCLIRLKPTL 

Gene : HepCla 

Segmenti : 108 
Offset : 1606 
1st Codon : 1 

HDQMWKCLX RLKPTLHGPTPLLYRLGAVQN 
TGGGATCAGATCTGGAAATGCCTCATCAGACTGAAAC 

Gene : HepCla 

Segment* : 109 
Offset : 1621 
1st Codon : 1 

HGPTPLLYRLGAVQHBVTLTHPVTKYIMTC 

Gene : HepCla 

Segment* : 110 
Offset : 1636 
1st Codon ; 1 

BVTLTHPVTKYIKTCHSADLBVVTSTWVLV 
GAGGTCACCCTCACCCATCCCGTCACCAAATACATTATGA^ 



: HepCla 
Segment* : 111 
Offset : 1651 
1st Codon : 1 

MSADLBVVTSTWVLVGGVLAALAAYCLSTG 
ATCTCC GC CGATCTGGAAGTGGTCACCTCCACCT 

Gene : HepCla 

Segment* : 112 
Offset : 1666 
1st Codon : 1 

GGVLAALAAYCLSTGCVVIVGRIVLSGK PA 
QGCXXiAGTCCTlXSCCCCTCTOGCTO 

Gene : HepCla 

Segment* : 113 
Offset : 1681 
1st Codon : 1 

CVVIVGRIVLSGKPAI I PDRBVLYRBFOBM 
TGCGTCGTGATTGTGGGAAGGATTGTGCTCAGCGGAAAGC^ 

Gene : HepCla 

Segment* : 114 
Offset : 1696 
1st Codon : 1 

IIPDRBVLYRBPDBMBBCSQHLPYIBQGMM 
ATCATTCCCGATAGGGAAGTCCTCTACAGAG 



Gene : HepCla 

Segment* : 115 
Offset : 1711 
1st Codon : 1 

BBCSQHLPYXBQGHMLABQFKQKALGLLQT 
GAGGAATGCTCCCAGCATCTGCCTTACATTGAGCAA^ 

Gene : HepCla 

Segment* : 116 
Offset : 1726 
1st Codon : 1 

LABQPKQKALGLLQTASRQABVIAPAVQTM 
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Gene : HepCla 

Segment* : 117 
Offset : 1741 
1st Codon : 1 

ASRQABVIAPAVQTNWQKLB 

GCCTCCAGGCAAGCCGAAGTGAT 



FHAKHMWNP 
ATATGTGGAACTTT 



Gene : HepCla 

Segment • : 118 
Offset : 1756 
1st Codon : 1 

NQKLBVPWAKHMWtfPISGIQYLAGLSTLPG 
TGCCAAAACCTCCACGTITICT^ 

Gene : HepCla 

Segment* : 119 
Offset : 1771 
1st Codon : 1 

ISGIQYLAGLSTLPGNPAI AS LMAFTAAVT 
ATCTCCGGCATTCAGTATCTGGCrGGCCTCAGC^ 




AACCCTGCCATTGCCTCCCTGA' 



QTLLPNILG 
rCCTCCTGTTTAACATTCTGGGA 



Gene : HepCla 

Segment* : 121 
Offset : 1801 
1st Codon : 1 

SPLTTSQTLLFH ILGGN 
AGCtXTl"lX*ACAACtJTLtXJVGACAClt^^ 



VAAQL.AAPGAATA 



HepCla 
Segment* : 122 
Offset : 1816 
1st Codon : 1 

GWVAAQLAAPGAATAPVCAGLAGAAIGSVG 
?TGGCrGCCCCTGGCGCTG C C A CAGCCITT^ 



Gene : HepCla 

Segment! : 123 
Offset : 1831 
1st Codon : 1 
PVGAGLAGAA 



L A G Y G A G 

ATACGGAGCCGGA 



Gene : HepCla 

Segment* t 124 
Offset : 1846 
1st Codon : 1 

LGKVLVDI LAGYGAGVAGALVA PKI M S G B V 
CTGGGAAAGGTCCTGGTCEACATTCTGGCTGGCTO 



Gene : HepCla 

Segment* : 125 
Offset : 1861 
1st Codon : 1 

VAGALVAFKZMSGBVPSTBDLVNLLPAILS 

ATTATCTCCGGCCAACTGCCTAGCACAGACCAre 



Gene 
Segment* 
Offset 
1st Codon 
P S T B 



HepCla 
126 
1876 
1 

D h V 



HLLPAILS PGALVVGVVCAAILR 



CCCTCCACOGAAGACCTCGTGAATCTGCTCCCCGCT 



Gene 



HepCla 
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Segment* : 127 
Offset : 1891 
1st Codon : 1 

PGALVVGVVCAAILRRHVGPGBGAVQWMNR 

Gene : HepCla 

Segment * : 128 
Offset : 1906 
1st Codon t 1 

RHVGPGHGAVQWMNRLIAPASRGNHVS PTH 
AGGCATGTGGGACCOGGAGAGGGAGCOGTCCAGTGGATC 

Gene : HepCla 

Segment* : 129 
Offset : 1921 
1st Codon : 1 

LIAFASRGNHVSPTHYVPBSDAAARVTAI L 
CTGATTGCCTTTGCCTCCAGGGGAAACCATGTGTC^ 

Gene : HepCla 

Segment* : 130 
Offset : 1936 
1st Codon : 1 

YVPBSDAAARVTAI LSSLTVTQLLRRLHQW 
TACGTCCCCGAAAGCGATGCCGCTGCCAGAGTGACAG 

Gene : HepCla 

Segment* : 131 
Offset : 1951 
1st Codon : 1 

SSLTVTQLLRRLHQMISSECTTPCSGSWLR 
AGCTCCCTGACAGTGACACAGCTCCTGA^ 

Gene : HepCla 

Segment* : 132 
Offset : 1966 
1st Codon : 1 

ISSBCTTPCSGSHLRDIHDNICBVLSDPKT 
ATCTOeAGCGAATTXZACAACCCCTTCCTCC O GC^ 

Gene t HepCla 

Segment* : 133 
Offset i 1981 
1st Codon : 1 

DIWDKICBVLSDPKTWLKAKLHPQLPGI PF 

GgWCATTTQQGATTGG ATTTGOGA AGTCCTCAGCGA 

Gene : HepCla 

Segment* : 134 
Offset : 1996 
1st Codon : 1 

WLKAKLMPQLPGIPPVSCQRGYKGVWRGDG 
TGGCTCAAGGCTAAGCTCATGCCTCAGCTCCCCGGA A ' 

Gene : HepCla 

Segment* s 135 
Offset : 2011 
1st Codon : 1 

VSCQRGYKGVWRGDGXHHTRCHCGABI TGH 
GTGTCCTGC CA AAGGGGATACAAAGCCGTCTG GA 

Gene : HepCla 

Segment* : 136 
Offset : 2026 
1st Codon : 1 

IHHTRCHCGAB ITGHVKNGTMRI VG PRTCR 
ATCATGCACACAAGGTGTCACTGTGGCGCTGAGATTAC^ 

Gene s HepCla 

Segment* : 137 

Offset : 2041 

1st Codon : 1 
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VKNGTMRIVGPRTCRNMWSGTPP I N A Y T T G 
GTGAAAAACXXSAACCATGAGGATTGTGGGACCCAG^ 

Gene : KepCla 

Segment # : 138 
Offset : 2056 
1st Codon : 1 

NMWSGTPPINAYTTCPCTPLPAPNYTPALW 

AACATGTGGTCaXK3VCATTCCCrATCAATGCCTATO 

Gene : HepCla 

Segments : 139 
Offset : 2071 
1st Codon : 1 

PCTPLPAPHYTPALWRVSABBYVBI RRVGD 

CCCTGTACCCCTCTGCCTGCCCCTAACTATA ^ . T*Y7rcCGCOGAAGAGTATGTGGAAATCAGAAGGGTCGGCG^ 

Gene = HepCla 

Segment* : 140 
Offset : 2086 
1st Codon : 1 

RVSABBYVBIRRVGDFHYVTGMTTD HLKCP 
AGGGTCAGCGCTGAGGAATACGTOGAGATTAGGAGAGTGG 

Gene : KepCla 

SegjnentS : 141 
Offset : 2101 
1st Codon : 1 

PHYVTGMTTDWLKCPCQVPS PBPPTBLDGV 
ITUtJATlViCGTCACCGGAATGACAACCGATAAC^ 1 it, l TTA CCGAACTCGATGGCGTC 

Gene : HepCla 

Segments : 142 
Offset : 2116 
1st Codon : 1 

CQVPS PBPFTBLDGVRLHRPA P P C K PLLRB 
TGCCAAGTXXXrrAGCCCTGAGTTrr^ 

Gene : HepCla 

Segments : 143 
Offset t 2131 
1st Codon : l 
R h H R F A P 

Gene : HepCla 

Segments : 144 
Offset : 2146 
1st Codon : 1 
B V S F R V G 



PCKPLLRBBVSPRVGLHBYPVGS 
SAGAGGAAGTGTCCTTCAGAGTGGGACTGCATGAGTAT 



LHBYPVGSQLPCEPBPDVAVLTS 



Gene : HepCla 

Segment* : 145 
Offset : 2161 
1st Codon : 1 

QLPCBPBPDVAVLTSHLTDPSHITABAAGR 

Gene : HepCla 

Segment* : 146 
Offset : 2176 
1st Codon : 1 

MLTDPSHITABAAGRRLARGS PPSMASSSA 
ATGCTCACCGATCCCTCCCACATTACC^ 

Gene : HepCla 

Segments : 147 
Offset : 2191 
1st Codon : 1 

RLARGSP P S M A S S SASQLSAP SLKATCTAN 
TAGCATGCCCTCCAGCTCCGCCTCCCAGCTCAGCGCTCrc 
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Gene : HepCla 

Segment ff : 14 8 
Offset : 2206 

1st Codon : 1 

3 Q ****** s & K * T C TAMHDSPDABLIBANLLW 

AGCCAACrGTCCGCCCCTAGOCTCAACGCTACCT^ 

Gene : HepCla 

Segments : 14 9 
Offset : 2221 
let Codon : 1 

HDS PDA8LI BANLLWRQBMGGNITRVBSKN 

CAa»TAGCCCTGAOXnt3AGCTCATCGA 

Gene : HepCla 

Segment! : ISO 
Offset : 2236 
1st Codon t 1 

RQBMGGHITRVBSBMKVVILDSPDPLVABB 
AGGCAAGAGATQQGCGGAAACATTACCAGAGTGGAAAGCGAAAA 

Gene : HepCla 

Segment 9 : 1S1 
Offset : 2251 
1st Codon : 1 

KVVXI»DSFDFZ.VABBDHRBISVPASXI,RICS 
AAGCTCGTt»TTCTGGATA<XTTTX»^ 

Gene : HepCla 

Segment! : 152 
Offset : 2266 
1st Codon : 1 

° B R B - S V P A BIX * RKSRRFA QAI.PVWARPDy 
CWCGAAAGGGAAATCTCCOTQCCTGCCGAAATCCnCA^ 

Gene : HepCla 

Segment* : 153 
Offset : 2281 
1st Codon : i 

RRFAQALPVtfARPDYNPPLVBTIfKKPDYRP 
AGGAGATTCGCTCAGGCTCrGCCTCTO 

Gene : HepCla 

Segment* : 154 
Offset t 2296 
1st Codon : 1 

HPPLVBTWKKPDYBPPVVHGCPLPPPRSPP 
AACCCTCOCCTCGTGGAAACCIGGAAGAAACCCGAT^ 

Gene : HepCla 

Segment* : 155 
Offset : 2311 
1st Codon : 1 

PV V H G CP LPPPRSPPVPPPRKKRTVVLTBS 
CCam^ft^TQGCTCTCC^^^ 

Gene : HepCla 

Segment* : 156 
Offset : 2326 
1st Codon : 1 

VPP PR KKRTVV LTBS TL S TALABLATK S PG 
GTGCCTCCCCCTAGGAAAAAGAGAACXGT^^ 

Gene : HepCla 

Segment* : 157 
Offset : 2341 
1st Codon : 1 

TLSTALABLATKSPGSSSTSGITGDNTTTS 
ACCCTCAGCACAGCCCTCGCCGAACTGGCT 

Gene : HepCla 

Segment* : 1SB 
Offset : 2356 
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1st Cod on : 1 
S S S T S G 



I TGDNTTT 



BPAPSGCPPDSDAB 



Gene : HepCla 

Segment* : 159 
Offset : 2371 
1st Cod on : 1 

SEPAPSGCPPDSDAESYSSMPPLBGBPGDP 

TCCGACGCTGJWGTCCTACTCCAGCATGCCC^^ 



: HepCla 
Segment* : 160 
Offset : 2386 

1st Codon : 1 

SYSSMPPLBGBPGDPDLSDGSHSTVSSBAG 
AGCTATAGCTCCATGCCTCCCCTCGAGGGAGAG^ 

Gene : HepCla 

Segment* : 161 
Offset : 2401 
1st Codon : 1 

DLSDGSWSTVSSBAGTBDVVCCSMSYSWTG 
GACCTCAGCGATGGCTCCTGGTCCACCGTCAGCT^ 

Gene : HepCla 

Segment* : 162 
Offset : 2416 
1st Codon : 1 

TKDVVCCSMSYSWTGALVTPCAABB 
ACCGAAGACGTCGTGTGTTGCTCCATGTCCT 

Gene : HepCla 

Segment* t 163 
Offset : 2431 
1st Codon : 1 

ALVTPCAAHKQKLPINALSNSLLRHHNLVY 
GCCCTCGTGACACCCTGTGCCGCTGAGGAACAGAAA 

Gene : HepCla 

Segment* : 164 
Offset : 2446 
1st Codon : 1 

NALSNSLLRHHNLVYSTTSRSACQRQXKVT 
AACGCTCTGTCC A ACTCCCT GC TCACGC A 7CACAATCTC 

Gene : HepCla 

Segment* : 165 
Offset : 2461 
1st Codon : 1 

STTSRSACQRQKKVTPDRLQVLDSHYQDVL 
AGCACAACCTCCAGGTCCGCCTGTCA^ 

Gene : HepCla 

Segment* : 166 
Offset : 2476 
1st Codon : 1 

FDRLQV LDSHYQDVLKBVKAAASKVKANLL 
TTCGATAGGCTCCAGGTCCTGGATAGCCATTAC 

Gene : HepCla 

Segment* : 167 
Offset : 2491 
1st Codon : 1 

K B V K A A A S K V K AHLLSVBBACSLTPPHSAK 
AAGGAAGTGAAAGCCGCTGCCTCCAAGGTCAAGGCTAACCTCCT^^ 

Gene : HepCla 

Segment* : 16 S 

ffset : 2506 
1st Codon : 1 

S V B B A C SLTPPHSARS KFGYGAKOVRCHAR 
AGCGTCGAGGAAGCCTGTAGCCTCACCCrTtrCCAT 
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Gene : HepCla 

Segment « : 169 
Offset : 2521 
let Cod on : 1 

S K F G Y G A KDVRCHARKAVAH INSVWKDZjLB 
AGOAATTCGGATACGGAGCCAAAGACGTCAGC 

Gene : HepCla 

Segment* : 170 
Offset : 2S36 
1st Codon : 1 

KAVAH IHSVWXDLLBDSVTPIDTTIMAXNB 
AAGGClGTGGCTCACATTAACTCCGTGTt^^ 

Gene : HepCla 

Segment! : 171 
Offset : 2S51 
1st Codon : 1 

DSVTP IDTTIMAXNBVFCVQPBKGGRXPAR 
OACTCCCmSACACCX^TroACACAACC^ 

Gene : HepCla 

Segment* : 172 
Offset : 2566 
1st Codon : 1 

VFCVQ PBKGGRXPARLXVPPDLGVRVCBKM 
GTCTTTI G COTC CA GCCTCACAAAGCCGGAAGGAAACCCGCT 



Gene : 


HepCla 






Segment* : 


173 






Offset 


2581 






1st Codon : 


1 






L I V P 
CTGATTGTGTT" 


P D L 


G 


VRVCBKMALYDVVSKLPLAVMG 


Gene : 


HepCla 






Segment* 


174 






Offset 


2596 






1st Codon : 


1 






A L Y D 

GCCCTCTACGA1 


V V s 


X 


LPLAVMGSSYGPQYSPGQRVBF 
CTGCCTCTGGCTGTGATGGGCTCCAGCTATO 


Gene » 


HepCla 






Segment! 


175 






Offset 


2611 






1st Codon : 


1 







SSYGFQYS PGQRVBPLVQAWXSKXTPMG PS 

AGCTCCTACGGATTCCAATACIXXCCCGGACAGAGAGT^^ 

Gene : HepCla 

Segment* : 176 
Offset : 2626 
1st Codon : 1 

L V 0 A W X 3 KKTPMGFSYDTRCF'DSTVTBS DI 
CTGGTCCAGGCTTGGAAAAGCAAAAAGACACCCATGGG 

Gene : HepCla 

Segment* : 177 
Offset : 2641 
1st Codon : 1 

YDTRC FD STVT BSDI RTBBA I YQCCDLD PQ 
TACGATACCAGATG Cl ' J I GA CTCCACCGTCACCGAAAGCGATATCAGAATO 

Gene : HepCla 

Segment* : 178 
Offset : 2656 
1st Codon : 1 

RTBBA IYQCCDLDPQARVAI KSLTBRLYVG 
AGGACAGAGGAAGCCATTTACCAATGCTGTGACCTCG^ 

Gene : HepCla 

Segment* : 179 
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Offfiet : 2671 
1st Codon : 1 

A R V A I KSLTBRLYVGGPLTNSRGBNCGYRR 
GCCAGAGTGGCTATCAAAAGCCTCACXGAAAGGCTCT 

Gene : HepCla 

Segment* i 180 
Offset : 2686 
lat Codon : 1 

GPLTHSRGBHCGYRRCRASGVLTTSCGNTL 
GGCCCTCTt^CAAACTCCTWGGGGAGAGA^ 

Gene : HepCla 

Segment* : 181 
Offset : 2701 
1st Codon : 1 

C R A S G V L TTSCGNTLTCYI KARAACRAAGL 
YGCAGAGCCTCCGGCGTCCTGACAACCTCCTGCGGAAACAC^ 

Gene s HepCla 

Segment* : 182 
Offset s 2716 
1st Codon : 1 

TCY IKARAACRAAGLQD CTM1»VCGDDLVVI 
ACCTGTTACATTAAGGCTAGGGCTGCCTGTAG 

Gene t HepCla 

Segment* : 183 
Offset : 2731 
1st Codon : 1 

0 D C T M L V C G D D L V VI CB SAGVQBDAAS LRA 
CAGGATTGCACAATGCTCGTCTGTGGCX!ATGX< 

Gene : HepCla 

Segment* i 184 
Offset : 2746 
1st Codon : 1 

CBSAGVQBDAASLRAFTBAMTRYSAPPGDP 

TGCCAAACCCCTCGCCnCCAGGaAGACGC^^ 

Gene : HepCla 

Segment! t 185 
Offset : 2761 
1st Codon : 1 

PTBAMTRYSAPPGDPPQPBYDLBLITSCSS 
TTCACAGAGGCTATGACAAGCTATAGCGCTCCCCCTG 

Gene : HepCla 

Segment* : 186 
Offset : 2776 
1st Codon : 1 

PQPBYDLBLITSCSSNVSVAHDGAGKRVYY 
CCCCAACOCGAATACX^TCTGGAACTGATTAC^ 

Gene : HepCla 

Segment* : 187 
Offset : 2791 
1st Codon : 1 

NVSVAHDGAGXRVYYLTRDPTTPLARAAWB 

AACGTCAGCGTCGCCCATGACGGAGCCGGAAAGAG 

Gene : HepCla 

Segment* : 188 
Offset : 2806 
1st Codon : 1 

I*TRDPTTPLARAAWBTARHTPVNSWLGNI I 



Gene 
Segment* 
Offset 
1st Codon 



HepCla 
189 
2621 
1 



TARHTPVNSNLGNIIMFAPTLWARMILMTH 
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ACCGCTAQGO^TACCCCTGTGAATAGCTGGCTGGG^ 

Gene : HepCla 

Segment* : 190 
Offset : 2836 
1st Codon : 1 

HPAPTLWARMIJUMTHPFSVLIARDQLBQAL 
A1CT T TC CCCCTACCCTCTG«X^ 

Gene : HepCla 

Segment* : 191 
Offset : 2851 
1st Codon : 1 

PFSVLIARDQLBQ/ALDCBI YGACY S IBPLD 
TTCTTTAGCGTCCT G ATTGCCACAGACX!AAC^ 

Gene : HepCla 

Segment* : 192 
Offset : 2866 
1st Codon : 1 

DC BIYGACYS IBPLDLPPI I QRLHGLSAFS 
GACTOTCECATTTACGGAe CCTCTT ACTC 

Gene : HepCla 

Segment* : 193 
Offset t 2881 
1st Codon : 1 

LPPIIQRLHGLSAPSLHSYS PGBINRVAAC 

Gene : HepCla 

Segment* : 194 
Offset : 2896 
1st Codon : 1 

LHSYSPGBIHRVAACLRKLGVPPLRAWRHR 
CTGCATAGCTATAGCCCTGGCX5AAATCAATAGGGTCGCCGC 

Gene : HepCla 

Segment* : 195 
Offset : 2911 
1st Codon : 1 

LRKLGVPPLRAWRHRARSVRARLLARGGRA 
CTGAGAAACCTCCCCGTCCCCCCTCT G AG 

Gene i HepCla 

Segment* : 196 
Offset : 2926 
1st Codon : 1 

ARSVRARLLARGGRAAICGKYLPNMAVRTK 

Gene : HepCla 

Segment* : 197 
Offset : 2941 
1st Codon : 1 

Al CGKYLPNHAVRTKLXLT P IAAAGRLDLS 
GCCAT1TGCGGAAAGTATCTGTTTAACTGGGCCGTCAGGACAA 

Gene : HepCla 

Segment* : 198 
Offset : 2956 
1st Codon : 1 

LKLTPIAAAGRLDLSGNFTAGYSGGDIYHS 
CTGAAACTGACACCCATTGCCGCTGCCGGAA^ 

Gene : HepCla 

Segment* : 199 
Offset : 2971 
1st Codon : 1 

GW FTAGYSGGDI YHSVSHARPRHPHPCLLL 
GGCTGGTTCACAGCCGGATACTCCGGCGGAGACATTTACCAT^ 



Gene : HepCla 
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Segment! : 200 
Offset : 2986 
1st Codon : X 

V5HARPRWPWPCLLLLAAGVGI YLLPNRAA 
GTGTCCCAOXrrA«XXrrAGGTGGTrCTGGTrCTG^ 



HepCla 
Segments : 201 
Offset : 3001 
1st Codon -. 1 
LAAGVGIYLLPWRAA 
ATCTATCTGCTCCCCAATAGGGCTGCC 



Segments in scrambled order; 

HepCla #77 

V I PVRRRGD SRG 
GTGATTCCXGTCAGGAGAAGGGGAGA 



LLSPRPISVLKGSSGGP 
ATTAGCTATCTGAAAGGCTCCAGCGGAGGCCCT 



HepCla #6B 

ARRGRBXLLG 
GCCAGAAGGGGAAGGGAAAT 



PADGMVS RGWRLLA P ITAYA 
VTGGTCAGCAAAGGCTGGAGGCTCCTGGCTCCCATT^ 



HepCla #143 

RLHRPAPPCKPLLR 
AGGCTCCACAGAT 



B B 



VSPRVGLHBYPVGS 
rTGTCCITCAGAGTGGGACTGCATGAGTATCCCGT^^ 



HepCla #66 

VVPSQMBTKLITWGADTAACGDIINGLPVS 
ATGGAGACAAAGCTCATCACATCGGGAGCCGATACCGCTCCC^ 



HepCla #79 
LLCPACHAV 



GIPRAAVCTRGVAKAV 



D P I P V 
ITTTCATTCCCGTC 



HepCla #113 

CVVIVGRI 
TGOGTCGTGATTGTGGGAAGGAT 



VLSGKPAI IPDREVLYRBPDBM 
^TTATCCCTGACAGAGAGGTCCTGTATAGGGAATTCGATGAGATG 



HepCla #139 

PCTPLPAPNYTFALWRVSAEBYVB I RRVGD 
CCCrGTA(XXXrrCTGCCTGCCXCTAAC^ 

HepCla #174 

ALYDVVSKLPLAVMGSSYGPQYSPGQRVBP 
GCCCTCrAa^TGTCCTCMX^AACTGCCTCTGG^ 

HepCla #57 

ISNCLWWLQYFLTRVBAQLHVNVPPLHVRG 
ATCXarrGGTGTCTGTGGTGGCTCCACTATTTC 

HepCla #51 

ENLVILNAAS LAGTHGLVSPLVPPC PAWYL 
GAGAATCTGGTCATCCTCAAOG C TGCCTCCCIC GC TX X 5CACA 1 1 iLlU,lVn vm - l t X M rriXX CrGGTACCTC 



HepCla #193 

L P^ P I I Q R L H GLSAPSLHSYS PGBINRVAAC 
CTGCCTCCCATTATCCAAAGGCKX!ACGGACTGTCC^ 




HepCla #48 

G V G S S I 
GGCGTCGGCTCCAGCAT 



ASWAIKWBYVVLLPLLLADARVCS 
^TCAAATGGGAATACGTCGTGCrcCTGTrTCTGCTCCT^ 



HepCla #37 

LNHTRPPLGHWPGCTHHNSTG PTKVCGAPP 
CTGAATAACACAAGGCCTC(XCrCGGCAATrGGlTlXjX^ 



HepCla #185 

PTBAMTRYSAPPGDPPQPBYDL 



BLITSCSS 
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TTCACAGAGGCTATCACAAGGTATAGCGCTCCCCCTGGCGA 

PQRAYALDTBVAASCGGVVL 
nTACGCTCTGGATACCGAAGTOSCTGCCTCCTGC^^ 

Hepda #70 

QQTRGLLGC I ITSLTGRDKNQVBGRVQIVS 
CAGCAAACCAGAGGCXTICCTGGGATGCATTATCA 

HepCla #82 

SSPPAVPQSPQVAHLHAPTGSGKSTKVPAA 
AGCTCCCCCCCTGCCGTCCXXX^AAGC^ 

Hepda #104 

M T P C L P V C QDHLBPWBGVPTGT THIDAHPL 
AACACACCCCXSACTGCCTGTGTGTCAGGATCACCTCGA^ 

HepCla #26 

VLLLFAGVDABTHVTGGNAGRTTSGLVSLL 

GitXTCxi-nxrrciTO*:!^^ 
HepCla #110 

BVTLTHPVTKYIMTCMSADLBVVTSTWVLV 
GAGGTCACCCTCACCCATCCCGTCACCAA^ 

HepCla #56 

VGLMALTL S P Y YXRYISNCLWWLQYPLTRV 

GTGGGACTGATGGCVC'itlACCCTCAGCCXr^ 

HepCla #197 

A I C G K Y L P NWAVRTXLXLTPIAAAGRLDLS 
GCCATTTGCGGAAAGTATCTGTTTAACTGGGCCC 

HepCla #25 

I A Y P 3 HVGNWAKVLVVLLLPAGVDABTHVT 
ATCGCrrACTrTAGCATGGTCGGAAACTGGCCCAA^ 

HepCla #147 

RLARGSPPSHASS SASQLSAPS LKATCTAH 
AGGCTCGCCAGAGGCTCCCCCCCTAC^ 

HepCla #52 

GLVSPLVPPCPANYLKGRW 
QGCCTCOTGTCCTTCCTCGTOTTTTT C ICTT^ 

HepCla #145 

Q L _ P _ C 8 P B P p v A__V_ LTSMLTDPSHITABAAGR 
CAGCTCCCCTGTGAGCCTGAGCCTtiACGTCG^ 

HepCla #171 

DSVTPIDTTIHAKNEVPCVQPBKGGRJCPAR 
GACTCCGTGACACCCATTGACACAACCATTATGGCTAAGAATO 

HepCla #84 

YAAQGYKVLVLNPSVAATLGPGAYHSKAHG 
'ACGCTGCCCAAGGCTATAAGGTCCTGGTCCTGAATCCCTCtt 

HepCla #14 

V R II S T G L Y HVTMDCPHSSIVYBAADAILHT 

GTGJlGAAACItXACCGGACTGTATCAGGTCAC^ 

HepCla #175 

SSYGPQYSPGQRVBPLVQANKS KKTPNGPS 
AGCTCCTACXX^TTCCAATACTCCCCCGGACAGAGAGT^ 

NGLPVSARRGRB I LLGPADGH 
ntXrCCGTCAGCGCTAGGAGAGGCAGAGAGATTCTGCT 

HepCla #148 

SQLSAPSLKATCTANRDSPDABLI E A N L L W 

ATC^CTCCCCCGATGCCGAACTGATTGAGGCTAA^ 
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HepCla #120 

N PAIASLMAFTAAVTS PLTTSQTLLFN I L G- 
AACCCTGCC A TT GC CTOCClT^TGGCCTreACCGCT I-CClX/lTrAACATTCTGGGA 

HepCla #176 

LVQAWKSKKTPMGPSYDTRCPDS TVTBSDI 
CTGGTCCAGGCTTGGAAAAGCAAAAAGACACCCATCGGCTTTA^ 

HepCla #152 

D B R B I S V PARILRKSR R P AQALPVWARPDY 
GAOGAAAGGGAAATCTCCGTCCCTGCCGAAATCCTCAGGAAAACCA 

HepCla #190 

MPAPTI/HARM X LHTHPPSVLIARDQLBQAL 

ATGTTTGCCCCTACCCTCTGGGCTAGC ATO ATCCTCATGACACACTi i ^CTCCX^t^TCATOGCTAGGGATCAGCTCGAGCAAGCCCTC 

HepCla #96 

SVI PTSGDVVVVATDALMTGYTGDPDSVID 
AGCGTCATCCCTACCTCCGGCGATGTGGTCGTGGTC^ 

HepCla #94 

CHS KKKCDBLAAKLVALGINAVAYY RGLDV 
TGCCATAGCAAAAAGAAATGCGATGAGCTCGCCGCTAAGCTC 

HepCla #46 

VLPCSFTTLPALSTGLIHLHQNI VDVQYLY 
GTGCTCOCCTGTAGCTTTACCACACTGCCTGCC^ 

HepCla #53 

KGRWVPGAVYALYGMWPLLLLLLALPQRAY 
AAGGGAAGGTGGGTGCCTGGCGCTGTGTATGCCCTCTA 

HepCla #87 

SPITY3TYGKFLADGGCSGGAYDII ICDBC 
AGCCCTATCACATACTCCACCTATGGCAAATTCCTCGC^ 

HepCla #196 

A R S V R A R U I> A RGGRAAICGKYLFNWAVRTK 
GCCAGAAGCGTCAGGGCTAGGCTCCTGGCTAGGGGAGGCAG 

HepCla #170 

KAVAH XHSVWKDLLBDSVTP I DTTI M A K N B 
AAGGCTGTGGCTCACATTAACTCCGTGTGGA^ 

DR5GAPTYSWGANOTDVFV 

TAGGTCCGGCXKTTCCCACATACTCCTGGGGA^ 

HepCla #16 

PGCVPCVR BGNASRCWVAMTPTVATRDGKL 
CCCGGATTX3GTCCCCTGTGTGAGAGAGGGAAACGCTAG 

HepCla #183 

QDCTMLVCGDD LVVI CBSAGVQBDAASLRA 
CAGGATTGCACAATGCrCGTGTGTGGCGATGACCTC^ 

HepCla #125 

VAGALVAFJCIMSGBVPSTBDLVNLLPAILS 
GTGGCTGGCGCTCK^'fCtXtlTrrA 

HepCla #177 

YDTRCFDSTVTBSDI RTBBAI YQCCDLDPQ 
TACGATACCAGATGCTTTGACTCCACCGTCACCXSAAAGCGA 

HepCla #103 

BLTPABTTVRLRAYMNTPGLPVCQDHLBFW 
GAGCTCACCCCTGCCGAAACCACAGTGAGACTGAGAGCCT 

HepCla #166 

PQPBYDLBLITSCSSNVSVAHDGAGKRVYY 
CCCCAACCOGAATACGATtTTGGAACTGATTACCTCCI^ 
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KepCla #9 

LGKVI DTLTCGFADLMGYI PLVGAPLGGAA 
CTGGGAAAGOTCATCGATACCCTCACCTGTGGCTTI^ 

HepCIa #93 

AI PLBVIKGGRHL I P CHSKKKCDBLAAKLV 
GCCATTCCCCTCGAGGTCATCAAAGGCGGAAGGCATCTGATTTT^ 

HepCIa #112 

GGVLAALAAYCLSTGCVVXVGRIVLSGKPA 
GGCGGAGTtXrKXXXGCTCTGGCTGCCT^ 

HepCIa #184 

CBSAGVQBDAASLRAPTBAMTRYSA'PPGDP 
TGCGAAAGtfXXTTGGCGTCCAGGAAGACCCT 

HepCIa #199 

GWPTAGYSGGDIYHSVSHARPRWPWFCIiLL 
GGCTGGTTCACAGCCQGATACTCCGGCGG^ 

HepCIa #158 

SSSTSGITGDNTTTSSBPAPSGCPPDSDAB 
AGCTOCAGCACAAGOGGAATCACAGGCGATAAC^ 

HepCIa #100 

RTQRRGRTGRGKPGIYRFVAPGBRPSGMPD 
AGGACACAGAGAAQGGGAAQGACAQGCAGAGGCAAACCCGGAATCT 

HepCIa #43 

VRHYVGGVBHRLBAACNWTRGBRCDLEDRD 
GTGAGAA'ICTATGTGGGAGGCGTrCGftGC^TAGGC^ 

HepCIa #58 

BAQLHVWVPPLHVRGGRDAVILLKCVVHPT 
GAGGCTCAGCTCCACGTCTCGGTOLXXXXrrCr G 

HepCIa #4 

LGVRATRKTSBRSQPRGRRQPI PKARRPBG 

HepCIa #187 

fcfVSVAHDGAGKRVYYLTRDPTT PLARAAWR 
AAOGTCAGOGTCGCCCATGACGGAGCCGGAAAGAGAC^^ 

HepCIa #159 

3 BPAPSGCPPDSDABSYSSMPPLBGBPGDP 
AGCGAACCCGCTCCCTCCGGCTGT^^ 

HepCIa #63 

I GGHYVQMAZ IKLGALTGTYVYNHLTPLRD 
ATCGGAGGCCATTACGTCX!AGATGGCCATTATCAAA 

HepCIa #126 

PSTBDLVNLLPAI LSPGALVVGVVCAAI LR 
CCCIXrCACOGAAGACCTCGTGAATCIXXrre 

HepCIa #24 

ILDHZAGAHWGVLAGZAY Fr^^m VGNHAKVLV 
ATCCICGACATCATOGClCCCGCrCACTGGGGCGTC^^ 

HepCIa »7 
BGCGWAGMLL 

GAGGGATGCGGAT 

HepCIa #21 

WTTQGCMCSIYPGHITGHRMAWDMMMNWSP 
TGGACAACCCAAGGCTGTAACTGTAGCATTTACCCT^ 

HepCIa 117 

WVAHTPTVAT.RDGKLPATQLRRHXDLLVGS 
TGGGTCGCCATGACCCCTACCGTCGCCACA^ 

HepCIa 842 
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RItWHYPCTINYTI P KVRHYVGGVBHRLBAA 
AGGCTCTGGCATTACCCTTGCACAATCAATTAXIACAATCTT^ 

HepCla #172 

V y C VQPBKGGRKPARLIVFPDLGVRVCBKM 
G TC TTTT tSCGTC CAGCCTCAGAAAGGCQGAAGGAAAC^ 

HepCla #10 

M G Y 1 P LVGAPLGGAARALAHGVRVLEDGVN 
ATGGGATACATTCCCCTCGTGGGAGCCCCTXrTG^ 

HepCla #27 

GGNAGRTTSGLVSLLTPGAKQNIQLIMTNG 
CGCGGAAACGCTGGCAGAACCACAAGCGGACrGGrrCRGCCICCrGACACTO 

HepCla #13 

LALLSCLTVPASAYQVRNSTGLYHVTNDCP 
CTGGCTCTGCTCAGCTGTCTGACAGTGCCTGCCTC^^ 

HepCla #71 

GRDKMQVBGSVQIVSTAAQTPLATCINGVC 
GGCAGAGACAAAAACCAAGTGGAAGGCGAAGTGCAAATCGTCAG^ 

HepCla #16 

PATQLRRHIDLLVGSATLCSALYVGDLCGS 
CXXtXrrACCCAACTGAGAAGGCATATCGATCTGCTOC? 

HepCla #83 

H APT G S G K S T K V P A AYAAQGYKVLVLN PSV 
CACGCTCCCACAGGCTCCGGCAAAACX3VCAAAGGTCCC^ 

Hepda #6 

RTHAQPGYPNPLYGNBGCGHAGHLLSPRGS 
AGGACATGGGCTCAGCCTCGCrATCXCTGGCXXCT^ 

HepCla #162 
T B D V V C C 



HepCla #55 

ALDTKVAASCGGVVLVGLMALTLS PYYKRY 
GCCCTCGACACAGAGGTOGCCGCTAGCTGTGGOGGAGTGGTCCI^ 

HepCla #36 

WMHSTGFTKVCGAPPCVIGGAGMNTLHCPT 
TQGATGAACTCCACCQGATTCACAAAGCTCTt^^ 

HepCla #168 

SVBBACSLTPPHSAKSKPGYGAKDVRCHAR 

AGCGTCGAGGAAGCCTGTAGCCTCACCCCTCCCC^T^ 

HepCla #119 

ISGIQYLAGLSTLPGNPAIASLMAPTAAVT 
ATCTCOGGCATTCAGTATCTGGCll^GCCTC 

HepCla #3 

QIVGGVYLLPRRGPRLGVRATRKTSBRSQP 
CAGATTCTGGGAGGCXntrrACCrCXrrGCCT 

HepCla #194 

L H S Y S P G B I N R V A A C LRKLGVPPLRAWRHR 
CIGCATAGCTATAGCCCTGGCGAAATCAATAGGGTCGCCGCTTGC^ 

HepCla #189 

TARHTPVNSWLGNI IHPAPTLWARMILMTH 

ACCGCTAGGCATACXXCreTG A ATAGCTQGCTGGGAAACaTTAT^ 

HepCla #81 

BHLBTTHRSPVPTDHSS PPAVPQSPQVAHL 
GAGAATCTGGAAACCACAATGAGAAGCCCTGTG^ 

HepCla #91 

ATP PGSVTVPH PMI EEVALSTTGB I PPYGK 
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kCCCTAACATTGAGGAAGTGGCTCTGTCCACCAaVGGCGAAATCCCTTTCTATTC 



HepCla 060 

LVFDITJCLLLAVFC PLHI LQAS LLKVPYPV 



HepCla #23 

TAALVMAQLLR I PQAILDMIAGAHWGVLAG 
ACCGCTGCCCTCGTGATGGCCCAACTGCTCAGG^ 

HepCla #98 

CNTCVTQTVDPSLOPTFT IBTTTLPQDAVS 
TGCAATACCTGTTTTGACACAGACAGltX^TTTCT 

HepCla #109 

HGPTPLLYRLwAVQNBVTLTHPVTKYI MTC 
»CGGACCCACACXXXTCCTGTATAGGCTCGGCGC^ 

HepCla #179 

A R V A IKSIiTBRl»YVGGPLTNSRCBNCG YRR 
GCCAGAGTGGCTATCAAAAGCCTCACCGAAAGGCTCT 

HepCla #39 

CVIGGAGMNTLHCPTDCPRKHPBATYSRCG 
TGCGTCATCGCAGGCX3CTGGCAATAACACACTGCATO 

HepCla #76 

TCGSSDLYLVTRHADVIPVRRRGDSRGSLL 
ACCTGTGGCTCCAGCGATCTGTATCTGGTCAC^ 

HepCla #138 

HMWSGTPPIHAYTTGPCTPLPAPNYTPALW 
HepCla #89 

HSTDAT8I LGIGTVLDQABTAGARLVVLAT 
atCTCCACCCATGCCACAACCATTCTGGGAA 

HepCla #130 

YVPBSOAAARVTAI LSSLTVTQJULRRLHQW 
TACGTCCCCGAAAGCGATGCCGCTGCCAGAGTCAC^ 

HepCla #8 

RPSWGPTDPRRRSRWLGKVIDTLTCGPADL 
AGGCCTAGCTGGGGCCCTACCGATCCCXGAAG 

HepCla #33 

GPDQRPYCWHYPPKPCGIVPAKSVCGPVYC 
GGCCCTGACCAAAGGCCTTACTGTIGGCATTA 

HepCla #115 

BBCSQHLPYIBQGMMLABQPKQXALGLLQT 
GAGGAATGCTCCCAGCATCTGCCTTACATTGAGCAAGG 

HepCla #107 

YQATVCARAQAP PPSKDQMHKCLI RLKPTL 
TACCAAGCCACAGIGTGT G CCAGAGCCCAAGC^ 

HepCla #34 

CGIVPAKSVCGPVYCFTPSPVVVGTTDRSG 
TGCCCJUVTCGTCCCCGCTAAGTCCXyrGTCTG GC ^ 

HepCla 8131 

SSLTVTQLLRRLHQWISSBCTTPCSGSWLR 
AGCTCCCTGACAGTGACACAGCTCCTGAGAAG 

HepCla #161 

DLSDGSWSTVSSBAGTEDVVCCSMSYSWTG 
HepCla #108 

WDQMWKCLIRLK PTLHGPTPLI*YRI*GAVQH 
TGGGATCAGATGTGGAAATGCCTCATCAGACTGAA^ 



CTGGTCTTCGATA" 




kTTTCGTC 
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HepCla #116 

LABQFKQKALGLLQTASRQABVIAPAVQTN 
CTGGCTGAGCAATTCAAACAGAAAGCCCTCGGCCT 

HepCla #118 

MQK LBVFWAKHMWNF ISGIQYLAGLSTLPG 
TGGCAAAAGCTCGAGGTCTTCTGGGCCAAACACAT^^ 

HepCla #129 

L I A F A S ROM H V S P T H Y V PBSDAAARVTAIL 
CTGATTGCCTTTGCCTCCAGGGGAAACCATGTCnX:CCCC^ 

HepCla #19 

ATLCSALYVGDLCGSVFLVGQLFTPSPRRH 
GCCACACTGTGTAGCGCTCTGTATGTGGGAGACCTCI^ 




HepCla #29 

S W H IHSTALNCNBSLNTGWLAGLPYQHKFN 
AGCTGGCACATTAACTOCACCGCrCTGAATTGC^ 

HepCla #164 

HALSNSLLRHHHI.VYSTTSRSACQRQK1CVT 
AACGCTCTGTCCAACTCXXITGCrcAGGCATCAC^ 

HepCla #1 

AAMSTNPKPQRKTKRNTNRRPQDVKPPGGG 
GCCGCTATGTCCACCAATCCCAAACCCCAAAGGA 

HepCla #106 

SQTKQSGBNFP-YLVAYQATVCARAQAP P PS 
AGOCAAACCAAACAGTCCGGCGAAAACTTTCCCT^ 

HepCla #36 

APTYSNGAHDTDVFVLHNTRPPI>GHWFGCT 
HepCla #156 

V P P PR KKRTVVLTBSTLSTALABLATKS FG 
GTGCCTCCCCCTAGGAAAAAGAGAACCGTOGTGCTCACC^ 

HepCla #165 

STTSRSACQRQJCKV. TFDRLQVLDSKYQDVL 
AGCACAACCTCCAGGTCCGCCTGTCAGAGACAGAJ^ 

HepCla #90 

DQABTAGARLVVLATATPFGSVTVPHPNIB 
HepCla #141 

FHYVTGMTTDNLKCPCQVPS PEFFTBLDGV 
TTCCATTACGTCACCGGAATGACAACCGATAACCTCAACT 

HepCla #198 

LKLTP IAAAGRLDLSGWFTAGYSGGDIYHS 
CTGAAACTGACACCCATTGCCGCTGCCGGA 

HepCla #117 

ASRQA EVIAPAVQTNWQKLBVFWAKHMHHF 
HepCla #181 

CRASGVLTTSCGNTLTCYIKARAACRAAGL 
TGCAGAGCCItXGGCGTCCTGACAACCTCCTGCGGAAJVCACA 
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HepCla #166 

FD R L Q V L P SHYQDVLKEVKAAAS K V JC A H < L L 
TTCGATA<XXrrCCACOTtXTC»TAGCCATTACCA 

HepCla #180 

0 F I> T M 8 R OBMCGYRRCRASGVLTTSC 0 M T I* 
CGCCCTCTGACAAACTCCAGOGGAGAQAATTOCCQA^ 

HepCla #136 

1 M H T R C H C G AB ITGHVKM GTMRI V G P R 't C R 

ATCfcTGCACACAAGGTGTCACTGTGGCGCTGA 

HepCla #144 

BVSFRVG LHBYPVGSQLPCBPBPDVAVL TS 
CAGCTCACCTTTACOQTCOOCCTC CA CCM 

HepCla #167 

KBVKAAASKVKANLLSVBBACSLTPPHSAK 
MGOAACTGAAAGCCCCTGCCTCCAA^ 

HepCla #59 

GRDAVZ LLMCVVH P T I* V P D ITKLLLAV FGP 

OGCAGACACGCTCTGATTCTGCTCAIV^^ 

HepCla #146 

H L T D P 8 H ITABAAGRRLARGS PPSMA 3 3 3 A 

ATGCTCACCGATCCCrCCXZACATTACOGCTGAGC 

HepCla #78 

S PR PI SYLKGSSGG PLLCPAGHAVG X F R A A 

AGCCCTAOGCCTATCTOCTACCTCAAGGG 

PDQRPYCWHYP P K P 
ATCAGAGACCCTATTCCTGGCACTATCCCCC^ 

HepCla #120 

B H V G P G KG A V Q If M II R L I A F A S R G N H V 3 P T H 

AGGCATGTGGGACCCGGAGAGGGACCCGTCCAGTGGATGA^ 

HepCla #50 "\'^\- 

C L W H M LL ISQABAALBNLVI L N A A S I» A G T H 
TQCCTCTOOATQATGCTCCTGATTAGCCAA^ 

HepCla #114 

X I PDRBVLYRBFD KMBBCSQHLPYI B Q 0 M M 
ATCATTCCCGATAGGQAAGTGCrCTACAGAGAGTTTGAOGAAAT^ 

HepCla #47 

L 1 H JL_? 0 * ?-- V - D _ V Q^>»TOVGSSIASWAl .|CHBY 
CTGATTCACCTCCACCAAAACATTGTGGATG^ 

HepCla #200 

VSHARPRWPWFCLLLLAAGVGXYLLP-NRAA 
GTGTCCCACGCTAGGCClAGOlt^lTtl^n^ 

HepCla #85 

A * T L ° P 0 AYMSKAHGI DP»IRTGVRTITTG 
GCCGCTACCCTCGGCrTTGGCGCTTACATGAGCAAAGCCCAT^ 

HepCla #62 

R V Q G I. I> R X C ALARKMIGGHYVQHAI XKLGA 
AGGGTCCAGGGACTGCTCAGGATTTGCGCTCTGGC^^ 

HepCla #153 

RRFAQALPVHARPDYNPPLVBTWK1CPDYBP 

AGGaG A TTCGCTCAOGCTCTCCCTGTGTGGG 

HepCla #72 

T A A Q T P l*ATCIMGVCWTVYHGAGTRTI ASP 
ACCGCTGCCCAAA CCTTTCIGGC TACCTGTATCAATC 

HepCla #65 
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PSQHBTKLITHGA 
TAGCCAAATGGAAACCAAACTGATTACCTGGGGCGCT 



HAHNGLRDLAVAVBPVV 
TGGGCTCACAATGGCCTCAGGGAT 

HepCla #74 

K ° _ P _ V I QMYTHVDQDLVGHPAPQGS RSLT PC 
AAGGGACCCGTCATCCAAATCTAraCCAATGTGGATCAG 



HepCla #151 

KVVILDSFDPLVA 
AAGGTOGTGATTCTGGAT 



B B 



DBRBISVPABILRKS 
^TTAGCGTCCCCGCTGAGATTCTGAGAAAGTCC 



HepCla #64 

h T G T Y V Y HHLTPLRDWAHHGLRDLAVAVBP 
CTGACAGGCACATACGTCTACAATCACCTCAC 

HepCla #80 

V C T RGVAKAVDFI PVBNLBTTMRS PV 
GTGTCTACCAGAGGCGTCGCCAAAGCCGTCGACTn 



HepCla #95 

ALGXNAVAYYRGLDVSVI PTSGDVVVV 
GCCCTCGGCATTAAOGCTGTG GC T t ACTATAGGGGACTG 



A T D 

kCCGAT 



HepCla #111 

MSADLBVVTSTMVLVGGVLAALAAYCLSTG 
ATGTCCGCOGATCTGGAAGTQGTCACXritXAC^^ 

HepCla #97 

All M TGYTGDPDSVIDCNTCVTQTVDPSLDP 
GCCCTCATGACAGGCTATACCGGAGACTTTGACTCCGTGA 

HepCla #2 

HTNRRPQDVKPPGGGQIVGGVYLLPRRG PR 
AACACAAACAGAAGGCCTCAGGATGTGAAATTCCCTGGCGGAGGC 

HepCla #11 

RALAHGVRVLBDGVHYATGNLPGCSPSI PL 
AGGGClCrGGCrCAOGGACrrGACAGTGC^^ 

HepCla #169 

SKPGYGAKDVRCHARKAVAHIHSVWKDLLB 
AGCAAATTCGGATACGGAGCCAAAGACXTrCAGGTGTCA 

HepCla #28 

TP G AKQNIQLIMTWGSWHINSTALKCHBSL 
ACCCX^GGaXTTAAGCAAAACATrCAGCTC^ 

HepCla #30 

HTGWLAGLFYQHKPNSSGCPBRLASCRRLT 
AACACAGGCTGGCTGGCTGGCCTCTTCT^ 

HepCla #49 

V V L L F L L L A DARVCSCLWMMLLISQABAAL 

GTGGTCCTGCTCTKXTCCTGCrCGCa» 

HepCla #192 

P C B I Y G A C Y SI BPLDLPPIIQRLHGLSAFS 
GACTGTGAGATTTACGGAGCCTGTTACTCC^ .^GOVTGGCCIX3U3CGClVit:AXX: 



HepCla #73 

W T V Y H G A G T R T I A S P K G P^ VIQMYTNVDQDL 

~ ~~ ~ *" ~ "~ ~ ' " ATTCAGATGTACACAAACXn^GACCAAGACCTC 



TGGACAGTCTATCAOGGAGCCGGAACCAGAACCAT 
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CTCAGAAAGCTCGGCGTCTCCC^^ 
HepCla #121 

SPLTTSQTLLPNILGGW 
AGCCCIXTTGACAACCTCCCAGACACTGCTCTTC^ 

HepCla #61 

L W I L Q ASLLKVPYPVRVQGLLR ICAJLARKM 

CTGTGGATCCTCCAGGCTAGCCTCCTGAAAGT^ 

HepCla #137 

VKHGTHRIVGPRTCRNMWS GTFPINAYTTG 
CntiAAAAACXSGAACCATXSAGGATrGTGGGACCCAGAACCT^ 

HepCla #92 

g V A h 8 TTGBIPPYGKAI PLRVI KGGRHLIF 
GAGGTCGCCCTCAGCACAACCGGAGAGATXtXXrTTTTAOGG 

HepCla #188 

LTRDPTTFLARAAWBTARHTPVHSWLGNII 
CTOACAAGGGATCCCACAACCCCTCTGGCrAGGGCTGCCTGGGA^ 

HepCla 9140 

RVSARBYVEIRRVGDPHYVTGMTTDNLKCP 
AGGGTCAGCGCTGAGGAATAOGTCGAGATTAGGAGAGTGGGAGACTTTCACTAT^ 

HepCla #155 

P V V H G C P L P PPRSPPVPPPRKKRTVVLTBS 
CCX^TCGTCCATGGClX^CCCTtrCCXr^ 

HepCla #157 

TLSTALABLATJCS FGSSSTSGI TGDNTTTS 
ACCCTOlGCACAGCXX^CGCXXyuvCTGGCT 

HepCla #135 

V g C QRGYKGVWRGDGIMHTRCHCGABITGH 

GTGTCCTGCCAAAGGGGATACAAAGGCGTCltWAGAGGCGATO 

HepCla #20 

V F L V GQLFTFSPRRHHTTQGCHCSIYPGHI 
GTGTTTCTGGTCGGCCAACTGTTTACCTl^ 

HepCla #123 

FVGAGLAGAAIGSVGLGKVLVDILAGYGAQ 
TTCGTCGGCGCIGGCCTCGCCGGAGCCGCTATOGGAAGCGT^ 

HepCla #133 

D 1 " p _?_ 1 C HVLSDFKTMLKAKLMPQLPGIPP 
GACATTTGGGATTGGATTTGCGAAGTGCTCAGCGAT7/n^ 

HepCla #15 

HS SZVYBAADAI I*HTPGCVP CVRBGNASRC 
AACTCCAGCATTGTGTATGAGGCTGCCGATGCCATTC 

bTDFDQGWGPXSYAMGS 
ATTTCGATCAGGGATGGGGACCCATTAGCrATGCCAATGGCTCC 

HepCla #178 

RTBBAIYQCCDLDPQARVAIKSLTBRLYVG 
AGGACAGAGGAAGCCATTTACCAATGCTGTGACCTCGACC^ 

HepCla #69 

V S K GWRLLAPITAYAQQTRGLLGCI ITSLT 
GTGTCCAAGGGATGGAGACTGCrCGCCCCTATOVCA^^ 

HepCla #191 

F F SV L I A R DQLBQALDCB IYGACYS I B P L D 
ITCTTTAGCGTCXTGATTGCCAGAGACCAACTGGAACAGC^T^ 

HepCla #142 

C Q V P 8 P B P F TELDGVRLHRFAP PCK PLLRB 
TGCCAAGTGCCTAGCCCTGAGTTTTrCACAGAGCTO 
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HepCla #182 

TCYIKARAACRAAGLQDCTMLVCGDDLVVI 
ACCTCnTACATTAAGGCTAGGGCTGCCTGTAOGGC^ 

HepCla #86 

IDPHIRTGVRTITTGSPITYSTYGKPLADG 
ATCGATCCCAATATCAGAACCGGAGTGAGAACCATT^ 

HepCla #44 

CNHTRGBRCDLSDRDRSBLS PLLLSTTQWQ 
TGCAATTGGACAAGGGGAGAGAGATGCGATCTGGAAGACAGA 



HepCla #127 

PGALVVGVVCAAILRRHVGPGBGAVQKMNR 
CCCGGAGCCCTCGTOCTCBGCGTCGTGT^ 

HepCla #149 

HDS POABLZ BAHLLWRQBMGGN ITRVBSBN 
CACGATAGCCCTGACGCTGAGCTCATCGAAGCCA^ 

HepCla #105 

B G V F T G X» T H IDAHPLSQTKQ3GBNFPYLVA 
GAGGGAGTGTTTACCGGACTGACACACATTGACGC1 (JA CTTTCTGTCCCA CACAAAGCAAAOCGGAGAG^ ^ 

HepCla #5 

RGRRQPI PKARRPBGRTHAQPGYPWPLYGH 

AGGGGAAGGAGACAGCCTATCCCTAAGGCTAGGAGACCCGAAG 

HepCla #173 

L 1 V F P D L G V R V c BKMALYDVVS KLPLAVHG 
CTGATTGTGTTTCCOGATCnXSGAGTGAGAff 

HepCla #12 

YATGNLPGCS PSIPLLALLSCLTVPASAYQ 
TACGCTACCGGAAACCTCCCCGGATGC^^ 

HepCla #124 

LGJCVLVDI LAGYGAGVAGALVA FKIHSGBV 
CTGGGAAAGGTCCrGGTCGACATTCTGGCTGGCT 

HepCla #160 

SYSSMPPLBGBPGDPDLSDGSWSTVSSBAG 
AGCTATAGCTCCATGCCTCCCCTCGAGGGAGAGCCTGG^ 

HepCla #150 

RQBMGGNITRVBSBNKVVILDSFDPLVABB 
AGGCAAGAGATGGGCGG A AACATTACCAGACTGCAAAG^ 

HepCla #75 

VGWPAPQGSRSLTPCTCGSSDLYLVTRHAD 
GTGQGATGGCCTGCCCCTCAGCGAAGCAGAA^ 

HepCla #88 

GCSGGAYDI I ICDBCHSTDATS I LG IGTVL 
GGCTGTMXXSGAGGOQCTTACQATATCXTTATCT 

HepCla #99 

TFTIBTTTI»PQD A V SRTQRRGRTGRGKPGI 
ACCTTTACCATTGAGACAACCACACTGCCTCAGGATGC^ 

HepCla #40 

DCFRKHPBATYSRCGSGPHITPRCLVDYPY 

HepCla #201 
L A A G V G I YLLPMRAA 

CTGGCTGCCGGAGTGGGAATCTATCTGCTCCCCAAT 
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HepCla #163 

ALVTPCAABBQKLPIHALSNSLLRHHNLVY 
GCCCTCGTGACACCCTGTGCCGCTGAGGAACAGAAA 

HepCla #132 

ISSBCTTPCSGSWLRDIWDWICEVLSDPXT 
ATCTCCACOGAATGCACAACCCCTTCCTCCGGCTCCTC 

HepCla #134 

WLKAKLMPQLPGIPFVSCQRGYKGVtfRGDG 
TGGCTCAAGGCTAACCrCATGCCTCAGCTCCCCPGA A 

HepCla #41 

SGPMITPRCLVDYPYRLHHYPCTIHYTI P K 
ACCGGACCCTCCfrTCACACCCAC^ 

Artificial Protein: 



VIPVRRRGDSRGSLLSPRPISYLKGSSGGPARRGRBILI/yPADGMVSKGWUXAPITAY 
n»Tn»GAOTAAa»IIllGIJ^SI^PAGHAVGIPRAAVC^^ 

VSABBYVRIRRVGDALYDVVSKL PLAVMGSSYG PQYS PGQRVBP I SWCX90TLQYFL11lVBAQIiHVW\^ PUfVRGBin«VI LNAAS LAGTHCLVS PXVP P 

CFA*YIiLPPIIOJU*HGIiSAFSlJlSYSPOT^ KWBYWLLFLLLADARVCSLM 

HTRPPIiGHWPCC 1 HWSreFTKVCGAPPPTKAMTOYSAPPGDPPQPBYP 

irSLTCTOKIIOVBGEVQIVSSSPPAVPOSP^ 

AGRTTSGLVSLLBVTLTHPVTKYIMIQIS^ 

U>I^IAYFSMVGNWAKVLVVU^AGVDAETH^ 

BPE PDVAVLTSMLTD PSH I TAT AAGRDSVTP I DTTTMAIQJBVPCVQPRKGGR1CPAR YAAQGYKVLVLH PSVAATI/3PGA YMSJOUKTVRNSTGL YHVTN 

DCPNSS I VYBAAOAI LHTSSYGPQYS PGQRVBPliVOAWCSKKTPMGPSmAACGD I INGLPVSARRGRB I LLG PADGMSQLSAPS LXATCTANHDS PD 

ABLI BANLLWHPAIASLKAPTAAVTS PLTTSQTLLFNI IX5LVOA>nC3iaCTPHGPS YDTRCFDSTVTBSD IDBRBI SVPAB I LRXS RRFAQAX PVWAR P 

DYMPAPTUCARMIUfTHFPSVLIARDOiBOAI^ 

TTI^AI^TCLlHIilQMIVDVQYLYlCGRKVPG^ 

AAICGJCYIiFHNAVRTKWWAHIHSVWro^ 

VATRDGKLGDCTMLVOSDDLVVICBSAGVQBDA^ 

BI*TPABTrVRIjRAY>tfrPGI*PVCQDHI^PWPQ PBYDLKLI TSCS SMVSVAHDGAGIOlVYYIiGKVTDTLTCGPADI>«3YI PLVGAPLGGAAAI PLBVIK 

GGRHIiI PCHSKKKCDBIjAAJCLVGGVIAMjAAYC!I*STGCV^ 

SHARPRMPW^SXI^SSTSGITGDNTTTSSEPAP 

DLBDPJDBAQLHVWPPLNVRGGRDAVILU^^ PKARRPEGKVSVAHIXJAG1CRVYYLTRDPTTPLARAAWBSK 

PAPSGCPPDSDAESYSSMPPLEGBPGOPIGGHYVOIiAI IKLGALTGTY 

GVIAGZAYFSMVGNNAJCVLVEGCGMAGWU^PRG 

QUUWIDIXVGSRUfHYPCTIMYTIFKVRKYVG^ 

DGTOXXMAOrrrSGLVSIXTPG^ 

LRRHIDIJ*VGSATLCSALYVGDIX^GS^ 

WTGALVTPCAABBQKLP IALDTBVAASCGGVVLVGI>1ALTLS PYYKKYWMNSTG PTXVCGAP PCVI GGAGNNTUiCPTSVEEACS LTPPHSAJCS KFGY 
GAJCDVRCHARISGIQYIAGI«STUGNPAIASIi^ 

HRTARHTPVHSWLGMI IMPAPTLNARNI UfTHBHLBTTMRS PVFTTBf SS P PA VPQS FQVAHLATPPGSVTVPH PHI BBVAL5TTGBI PPYGKLVPDIT 
JOXLAVTCPIJIlLOASiaJCVPYFVTAALVMAQI 

NBVTLTHPVTKYI»rPCARVAIItSI.TBRIiYVGGPLTOS PVRRR 

GDSRGSIJ^HMWSGTPTIKAYTTGPCr PLPAPK^^ 

RPSWGPTDPWU*SRVL<aC\rcDTLTCG 

A<^PPSWDOMWCCIJRIJCPTLCGIVPAKSVro 

BIWCCSHSYSVIWDQMWCCM 

LSTLPGU APASRGNHV^PTHYVPBS3DAAARVTAI IATLCSALYVGDIXSSVFLVGQLPTPS PRRHS6Vl^CYDA<X!AVr/BLTP AETTVRIJIAYMGW 

VAAQIJUU>GAATAFVGAGIJU»AIGSVGS^ 

RKTKRin'NRRPCDVXFPGGGSOT^ 

TAIABIATKSPGSTTSRSACQRQtaCVTPD^^ 

U)GVIJCLTPIAAAGRU>LSGnrFTAGYSG^ 

QVLDSHYQDVLKliVKAAASKVXANLI^ 

VGSQLPCB PBPDVAVLTSKBVKAAAS KVXAHLLSVBBACS LTP PHSAKGRDAV I LLMCWH PTLVPDITKLLLAVPGPMLTD PSH I TABAAGRRLARG 
SPPSMASSSASPRP1SYIJCGSSGGPIXCPA<^VGIPRAADPDQG»K3PISY 

THCLWMMLLI SQABAALBNLVI LNAASLAGTHI I PDRBVLYRB FDEMEBCSQHLPY I BQGMMLIHLH^JI VD VQYL YCVCS S IASNAI KWEYVSHARP 

rwfwpcuj<laagvgiyllpiiraaaatix:pg^ iklgarrpaoalpvharpd 

YHPPLVBTITCJCPDYBPTAAOTPTATCIll^^ 

GSRSLTPCKWIIJ5SFDPLVAKEDKRBISVPABILRJCSLTGTYVYNHLT 
AliGIMAVAYYRGUDVSVIPTSGDVVVVATDMSADLBVV^ 

VKFPGGGQIVGGVYU«PRRGPRRAIiAHGVRVLBZ)GVNYA QLINTHGS 
MHIHSTAUICNKSUITGWJ^LPYQHKPHSSGCPBRLASC3U^ IQRLH 
GI«SAFSIfTVYHGAGTRTZASPKGPVIOMYTNVDO^LYR 

KlXJVPPIJttNRHRARSVRARlXARGGRAS PL LGGWVAAQIAAPGAATALMI LQAStXKVPYPVRVQGUJUCAiyUl^^ 

PRTCRNMWSGTFP INAYTTGEVALSTTGB I PFYGXAI PLBVI KGGRHW FLTRDPTT P LARAAWETARHT PVNSWLGNI I RVSAEBYVR IRRVGDFHY 
VTGHTTDNIJCCPPVVHCCPU>PPRSPPVPPPRXra 

ITGHVPXVGQLFTPS PRPJWTTQGCNCS I Y PGH I PVGAGLAGAAIGSVGLGKVLVD I LAGYGAGDI HDWI CBVLSDFKTWLKAJCIW PQLPG I PFMSS I 

VYBAADAILHTPGCVPCVREGNASRCSSGCPBR1ASCRRI*TO 

AYAO^TRGUiGCIITSLTTF5VI>IARIX3LBOALOC5I 



Figure 26 (Cont) 



WO 01/090197 



PCT/AU01/00622 



142/216 

LVCGDDLVVIIDPNIJrrCMmTTGSP 

QAPGALVVGWCAAIIJUUIVGPGEGAVQWM*nUlDSP 

IPKARRPEOm^PCTPWPLYGNLIVPPDI/^VCBKM^^ 

GVAGALVAFKIMSGEVSYSSMPPLEGBPGDPDl^DGSWSTVSSRAGRQEHGGNITRV^^ 

YLVTRHADGCSGGAYDI I ICDBCHSTDATSILGICTVLTPTI ETTTLPQDAVSRTQRRGRTGRGKPG IDCFRXHPEATYSROGSGPWITPRCLVDY PY 

LAAGVGIYIJ,PlfRAAAI,VTPCAAJ^KLPIHAI^ 

GVimGDGSGimiTPRCLVDYPYiaJWYPCTIMYTIPK 

Artificial DHA: 



QTQATTCCCPTCAfiGACiAAQ(^^ 

<»GAAG(aSAAATCCTCCTiSCGACaXXrrG^ 

CnXXJCl'lT U frAACCCCTOT^ 

GTCTC OCXrCGAACAGTATQTCXSAAATC^ 
CITTCACTATAGCCCITXXXSUU^^ 
GC gTGCCTC CCCTCAACGTC^G^^ 
TGCT 7 TGCCTGGT ^^ 

QGCTGCCTCTAACCCTCCCCnXTGGAAACCTGGAAG^ 

AACACAACGCCTCCCCTCOGCAATTCXreTTtXSCTCTACC^ 

TCGCCCTCCCCCAAAGGGCTTAOtXrrCTGGATACCGAAGT^ 
ATCACAAGCCTCACCC3GAAGGGATAA<yVATCAGGTC^ 

AAGtXXfltn'ltJtflAGGCCrCACCCATATOGATGC^ 

GCOX^AGGACAACCTCCGGCCTCG'rG'rcC^^ 

CGTOC71\»ACAACCACATGGGTCCTXXrrO 

AATACTTTCTCACAAGGGTCGCCATTTGCGGAAAGm 

CTGGATCTGTCCATOGCTTACTTTAGCATGGTGGGAAACr^ 

GACAAGGCTOGCCAGAGGCTCCCCCCCTAGCATGGCCTC^ 

i Ui lGl LC TX C Ll l^ lOllTn ViU r nXXX rrTGGTATC^^ 

GA<XXrrGACCCT0AC3GTCGCOGTCCT(aU^ 

OU>A«^a ijvii^iaAGAATGAQ^^ 

TCCTGAATCCCTCC^^ 

GACTGTCCCAATAGCTCCATCG^ 

OCTOGTGCAAGOCTGGAAGTCCAAGAAAACC^ 

GCC3GAACTGATTt»GGCTAACCTCCTGTCX3AAC^^ 

CCTCCTGTTTAACATTCTGGGACKSGTC 

C^GftCTtX^CATTGACGAAAGGGAAATCTCCCT 

GACTATATG 1 1 ItXCCCTACCCTCTGGGCTAGGATGATCCTCAT^^ 

OST^TOTOCCTCOGGro 

ACCACACTGCCTCCCCTCAGCACAOC^^ 

CrrATCXXXrTCTACGGAATGTVaXXXVKXn 

TCCXXTOTtaXGGATGCTCCGGCCGACCC^ 

GCCGC TATCTCTGGCAAATACCTCTTCAATTGGGCT^^ 

CCTCACCOCTATOGATACCACAATCATGGCCAAAAAOGAATTCACAC^ 

CCTGCTXAGCCAATC^CACACAC^ltLTi ^ 

GTGGCTACCAGAGACGGAAAGCTCCACGATTGCACAATG^ 

o rcgcccTc aflpre^^ 

TTCTGTOCTAOGATACCAGATGCTTTGACTCCACCGTC^ 
G»CCTCACCCCT(XX35AAACCACXC^^ 
CGAATAOGATCIGGAACTGATTACCTCCTGCTCCAGCAATGTC 
ATACCCKACCnnTX XrrrTG CCGAT^^ 

CCCQQAAQGCATCTGAT TTTCTGTCACTCCAAGAAAAACTCrGACGAACTGGCIGCCA 
CCTCMraCAGCCTCTCTGGTCAT^^ 

GAQCCrTTACCGAAGCC ATGACCAGATACTOO GCCCCTCCCQGA^ 

AGraTGCCACACCCAGATGCTTTTGGTTT^ 

CCCTAGCGGATGCCCTCCCGATAGCGATCCCGAAAGGACAC^ 

GACMAGACCCTCCGGC^TGTTCGATGTGAGAA 

GACCTCGAGGATA<XSGATGACGCTCAGCTCCACGTCTtX^ 

<^TCCCAaVCT«XyKm3K3ATC^ 

CAAAanxaGCGTOXX^ 

CCCgCrCCCTCCGGCTGTCCCCCTGACTO 

ATTGCCTATTTCTCCATGGTCCGCAATTG 
CTCGGGACCCACAGACCCTACGAGAAGGTCCA^ 
CAGGCCATA*XLMTGGCCTGCX^ 
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CACCTCACCUraCACATra 

OGGA<nGGAACACAGACTGGAAGCCXXTGTGlU"riXXX»l'CCAG 

TCAGQCrrCTCCGAAAAGATGATGGGATACATTCCCCTCGTCGGAGC^ 

GAX2GGAGTGAATCGCGGAAACGCTGGCAGAACC!ACAAGCG 

CXMACTGCCTCTtXrrCAGCTXnt^fGA^ 

GAGACAAAAACCAAGTGGAAGGCGAAGTGCAAATC^ 

OVAAAOCACAAAQGTCCCCXXrrGCCrATXXXXk-'r^ 

GGCCCCTCTAOGGAAACGAACXXritritXX^^ 

TGGACAOXXXrrCTGGTCACC^^ 

GGTOGGCCTCATGGCTCTGACACrGTCCCCCTATTA 

TTGXXSOGaGCOGGAAACAATACCCTCCACT^ 

GGCGCTAACGATCTG^GATGCCATGCCAGAATCT^^ 

GTCTTTCACAGCCGCTCnt^^ 

AAAGCTTCCCAGCCTCTGCATAGCTATAG^ 

CACA<yU\COGCTAGGCATACCCCTGTGAATAGCTGGCTGG 

CAATCItt^AACCAC*ATCAGAAGCC ClG^ ^ 

CTCGCTcexyrcLftCACTcxxrr^^ 

AA<XTCCTGCTOGClt>lVritXXJATO 

GCTCAOGATTCCCCAAOCCATTCTGGATATGATT^^ 

CCCTGGATCCCACATTCACAATOGAAACCA^^ 

AACGAAGTGACACTGACACACCCTGTGACAAA^ 

CCTCACCAATAGCAGAGGCGAAAACTGTGGCTATAGGAGATG 

AACACCCTGAGGCTACCTATAGCAGATGCGGAACCTGTG^ 

QGOGATAGCA GAGGCT CCCTGCTCAACATGTCG" 

CACATTC3GCTCTGTGGOwrTCCACCt»TGC^ 

TOCXX^CATACGTCCCCCyuUlGa^TCXXG^ 

AGGCCTA <yrrGQQGCCCI7i CCGATCCCAGA^ 

CCAAAi^VrrAlTU'ritX^TTAC^ 

ATCTtX!CTTACATTGAGCAAGGCATGATGCT^^ 

GCOTVAGCCCCTXXXXXTTAGCTGGGACCAAATGri^^ 

CCCItyTCTATTGCTTTACCCCTAGCCCT 

AATGOATD^GCTCCGAGTXn'ACCACACCCT^ 

GyUX^TGTGGTCTGCTCTAGCATGAGCTATAG 

CCCTCTGCTCTACAGACIGt^AGCCGTCCAGA^ 

CTGTCaUXCTCCCaxaACTGW TO 
OGCTATXXTCGCCACAXlTCltn'AGCGCTCTGTATGT(^^ 

GTCGCJGCCCAACrGGCTGCCCCrGGOGCTC^ 

CTCCACCGCTCTGAATTGCAATGAGTCCCTGAATAC^ 

TCAGGCATCACAATXTTGGTCTACTCCACCACAAGCAG^ 

AOXSAAAACC AAAAGGAATACCAATAGtSAGACC^ 

<^'IiJ*AXTATCACGCTA£0^^ 

TCAACAATACCtt^CCCCCTCrQGCAA^ 

ACCGCTCTQGCTGAGCTCGCCACAAACTLVrrCtAiAA 

GCTCGACTCCCACTATCAGC^TGTGCTCG^^ 

TCCC<XATCCCAATATCXjAA1TCCATTACCT^ 

Clt^TGGarrCCTGAAACrGACACCC^TTG^ 

CTCOGCCTCCAQCCAAGCCGAACTG A TTGCCC^ 

GAGCCTCCGGCGTCCTGACAACCTCCTG<3GGAAA^ 

CAGGTCCTGGATAGCGVTTACCAAGACCT 

GGGAGAGAATTGCGGATACAGAAGGTGTAGGGCTAGCCGAGTGC^ 

CTGAGATTACOGt3ACAOGTCAAGAATGGCACAAT(»CAAT^ 

GTGQGAAGCCAACr GCCTTGCGAAC CCC^ACCrG^ 

TTGACATrACCAAACTGCTCCTGGCT^^ 
AGCCCTC CCTCCAT GtXTTAGCTCC AGCGC IAGC^ 

ATCCCCCTAAGCXTAGGCATGTGGGACCCG^ 

ACCCATIGCCTCTGGATGATGCTCCTX^TTAGCCAAGCOG 

CATItXXGATAGOGAAGTOCrCTACAGAGAGTTTC 

TCCAOCAAAACATTGTGGATGTGCAATACCTCTAC 

ACGTQGTlCTGGTrCTG^CTGCrCCTG 

CATGAGCAAAGCCCATGGCATTGACCCTAACATTAGGACAGGCG^ 

CTAGGAAAATGATTGGCGGAOVCTATGTGCAAATGGCTATCAT^ 

TACAATCCCCCTCTCGTO^GACATGGAA^ 

CCTCTACa^TGCCCCTGGCACAAOGACAATCGCTA^ 

AAATGGAAACX^AACTGATTACCTGGGGCOT 

GGCTCCAGGTCCCTGACACCCTGTAAGGTCGTG^ 

TCGAGCCrGTGTGTACCAGAGGCGTCGCCAAAGCCX?^ 
GCCCTCGGCATTAAOGCTGTGGXnTACTATAGGGGACTGGATGT^ 
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CGATCTCXaUU^^CACra 
ATACOGGAGAClTfG\CTCOGTUAT^ 
GTGAAATTOCCTGGCQGAGGCCAAATCGTC^^ 

lTATCAATAGOGTCTGGAAAGACCTCCTGGAAACCCCTGGC^ 
TGGCATATCAATAGCACAGCCCTCAACTGTAACGAAAGCCTCAA 
CCC TGAQ AQACnX3CTAGCTCTAOGAGACrCA 
TCATCTCCC AGCXTT GAGCCTCCT^^ 
OGCCTCACOGCTTTCTCCTGGACAGW 

CTCCCAGACAC TOCTCTTCAATA ttXTCOGC^ 
TCCTCAAAGT^CTTACTTTOTOAG^ 

CCCAGAACCTGTAGGAATATGTGGACCGGAACCri lVlJC A Tr A A(XXnTACACAACCXX3^^ 

OOGAAAQGCTATCCCTCTGGAAGTGATTAAG^ 

CAGCCAGACACACACCOCrrCAACTCCTOGCTC^^ 

GTGACAGGC^TGACCACAGACAATCTC»A^ 

GAAAACXSACAGTGGTCXTIGACAGAGTCCACCCTCA^ 

GAGACAATAC CACAACCT CCGTGTCCrGCCAA AQO^TA^ 

*TCAO<GGLUilVlX,lTlVTGGTCG^ 

CAiTn^ixA^xrnxxxrrcGCCGGA^ 

TTTGGGATTGGATTTGCGAAGTGCTCAGOGATTTCAAAACCTGCSCTGAA 
GTGTATGAGGCTGCCGATGCCATTCTGCATACCCCTG GC IOI^ 
CXrrCGCXnXX^G»GAAGGCTC»CXX»TTTO»T^^ 
CTGACCTOGACCCTCAGGCTA^ 

GCCTAT(XXX3kACAGACAAG GGGA CTGCTCGCCItn^ 
QC ^TTGCGA AATCra 

CTGGT CTQCGGACA OGATCTGGTOSTC ^CTA 

OtXSAAAGTTTCTGGCTGACGGATGCAATTtXSACAAGGGG 

CAACCXAATGGCAAACCGGACACAGAAT^ 

CA<X»CTOOCGGAGCCCTOCSTGCT 

CGATAGCCCTGACGCTGAGCTCATaSAAGCCAATC^^ 

TTACOBGACTCACACACA TTCA OGCTCAC! TTTCTGTOCCAGACAAAGCAAAGOPGAGAGAATTTCCCTTAC^ 
GAGAGTSTCrcaUSAAAATGGCTCTGTATGAOtntXf 

ccATcrrrcTtxrraxxxrrccn^ 

GGOGTCCCCGGAGCCCTOGTGGCTTTCAAAAT» 
GTCCGACGGAAGCTGGAGCACAGTGTCCAGCGAAG 

TACCfCSlTSACAAGGCATGCCGATC^ 

^ TrQ GCACAGTGCTC^CCrTTACCA 

CTOGCA^^ 

CACCAATAGCCTOCTCAGAOUXIATAACCTOGTCT 

TCTGTGAGGTOTCTCOGACTTTAA^ 

CGACTGTGGAGGGGW^ 

TACCATTTTCAAA 



HepC Savine Cassette Sequences (A+B+C) with specific restriction sites removed which can be joined 
to generate a single expressible open reading frane that encodes the hepc Savine protein above 



Cassette A 



TATCTCAAAGGCTCCAGCGGACCCCCTGC 
AC(XTOGA«X:i C ciiG U CTCC ^ 
A<X»AAGTCTCCrT CA GACrCGGAC10CATG^ 
ACAT«aX3AGCCGATACCGCTTXCT 

TCTGOGAATCrTTAOGGCTGCCgrCrGCACAAGCGGACTCGCr^ 

TGGGAACGATTGTGCTCAGCtXSAAACCCrGCXIATTATCCC^ 

TCTACCCCTCTGCCTGCCCCTAACTATACCTrTGC^ 

CGCOGATGCCCTCrAOGATXnX3GTCAGCAAACT CC CTCTO 

GCCAAAGGGT^ 

GATTACGAACCCCCTGTGCTXX^UXGATGCCCrClX^ 
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CTGGGCTATCAAATGGGAATACGTCGTtMrrCCIOT 

QGCCTCCCCTCQCCAATT(XiTTTG G CTG TA C^ 

ACAGAGGCTATGACAAGGTATAGCGCTCCCCCTGGC^ 

ATACCGAAGTGGCTGCCTCCT 
ATGCATTATCACAAGCCTCACCGGAAGGGATAAOAATCAG 
GTOGAGOGAGAGCTOCW^TTGTtncCAGCTCCOCCCCT^ 

GGGAAQG CC T C TT CA CAOCXCTCACCCATATCC^ 

ACACACGTCACCCC J yOGCAATCXXXXlA^ 

CACCAAATACATTATGACATCX^TT3tfX^ 

ATTTGCGGAAACTATCTCrrTTAACT^ 

Tcmc cKT c ac r TM cr m acKTGGrt^^ 

CXX»AACCCATCT<MiCAAGGCTC^^ 
TCCCTGAAAGCCACATGCACAGCCAATGGCC^ 
ATGGCTCCCOQQACCOOTCTACGCrCTGTATQGCA,^^ 
GCATTXnTSACAGACCCTAGCCATATCACAGCCGAA^ 

CXTCGTCCTGAATCCCTCCGTGCXrnXCACA 

CCGGACltrrAlXZAO?rCACCAATGAC^triCtXAATA<^ 

TCCTACGGATTCXIAATACTCOCCOGGACAGAGAGTCXS^ 

GCCCTGCOGATGGCATGAGCCAACTGTCOQCCCCT 
GAACTGATTGAGGCTAACCTCCTGTGGAACCCTGCC A TTCCC^ 
CACCACAAGCCAAACCCTCCTCnTTAAC 
GCTATGACACAAGGTGTTTCGATAGCACAGTGA^ 

TAGGATGATCCTCATCSACACACTTTTTCTCOGTGC^ 

CCTCGGTCGATGTOGTOGrl&STGGCCACAG^ 

CAlACC^AAAACAAAttXXjATGAGCTOGCO CC lAAQCrOT 

C3GAOnt3GTGCTCCCCTOTtf^^ 

AtCTCCACTATCTGTATAACGGAAGGTQGGTCCCT^ 

CTCCTOGCTCTGCCTCA6AGAOCCTAIAGCCCTA 

CtrettttACCCTATGACATTAT CAT ri TO 

CCGCTATCTGTtXtCAAATACCTCITCAATTGGGC^ 

GATCTOCrCQAOGATACOGTCACC^^ 

CCTGTGTGAGAG AGGGAAAOOCT ACCACATGCTGGG^ 

CAQOQCTOTGGCTGGOGCrCIQGTCQCClTTAA^ 
TGCCTGCCAT rCTG I OC nuCGATACOWGATKITTTGACItX^ 
TATCACTGTTGCGATCTcGAcCCCCAAGAGCTCACC^ 
TGCkXrTCCCCGTCTQCCAACAO^ 

GCAATGTGTCCGTGGCTCACGATGGCGCTGGCAAAACSGGTC^ 
TlTGCOGATCTGATGOGCTATATOCCTCTGGTOaGOGCTCOCXTO^ 

CCGCTCTGGCTGCCTATTGCCTCAGCACXGGCltT^ 

TACOGAAGCCATGACCAGATACTCCGCCCCTCCCXSO 
ftTACTCOCXXGGAGACATTTACCATAGOGTO 

GGATGCCCTCCCGATAGCX^TGCCGAAAGGACACAGAC^ 

TGTGGCTCCCQGAGAGA<^jCCCTCCQG^ 

CCTGTAACTtXACCAGAGGCGAAAGGTGTGACCTCGA 

AATGTGAGAGGCGGAAGGGATGCXXrrCATCCTCCT^ 

AACCTCCGAGAGAAGCCAACCCAGAGGCAGA^ 

(XCATGACGGAGCCGrotAAOAGAOUVr 

AGACCCTATCGGAOGCCATTACGTCCAGATOGCCATTATCAA 
CX^VGACCXrnrCTGGGGACCCAaVGACCXrrAGG 



Cassette B 



ggcggat ccaccatgc t cgagTGGACAACCCAAGGCTGTAACTGTAGCATTTACCCTGGCC^ 

GGCCTCXXSACATGATGATGAACTGGACCCCTTGGCTIt^ 

CCACACACCTaWPGACACACATTGACCTCCT^^ 

TTTAAGGTCAGGATGTACGTCGCaSGAGTGGAACACAGACTGGA 1 1 GC GTC CA GCCTGAGAAAGGCGQ 

AAGGAAACCCGCTAGGCTCAICGTCrrCCCTGA 

TGGGAGCCCCTCTGGGAGGCGCTCCCAGAGCCC^ 
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GCTGGCRGAACXAaUGOGGACTOGTCWGCCTCCT^ 
ACTGGCTCTGCTCAGCTGTCTGACAGTGCCTGC^^ 
ACGATTGCtXTTQGCAGAGACAAAA 

CCGCTCIUXSGATACAAAGTGO'CGTGCTCAACCCT^^^ 

QGM^CGMMSOCTGfTOQClOOOCCOG M 

CTACTCCTGGAC*GGCGCTCTGGTCACCX^ 

CCGCTA<X7IXnTXXXX»GT«7rCCT^ 

TCCACOGGATTCACAAAGGTCTGCXXjACLX^^ 

AAGCGTCGAGGAAGCCTGTAGCCTCACCCCTCCCCATA 

GCCATCCCAGAATCTCCGGCATTCJ^GTATCTOGCnaX 

GCTTTCACAGCCGCTtn^ACACAGATTGTGGGAGGO^ 

ATTATCATGTTOCXTrCCCAC^CTGTOGGCCAGAATC^ 

TGTGi riA CCrATAACTCCAtXCCTCCC GC TCT TC CTC A CT 

TGACAGTGCCTCACCCTAACATTGACXiAAG^ 

GATATCACAAAGCTCCTCCTCC CC OTCTTCGGACCC^ 

CACOGCTGCCCTCGTGATGGCCCAACTGCTC 

TGCTCGCCGGATGCAATACCTtntrrGACACAGAC^ 

CTCCCCCAAGACGCTGTGTCCCACGGACC^ 

ACAjCCCTQTQACAAAOTATATCATGACCTGTGCCAGAGT^ 

CCCTCACCAATAGCAGAGCXGAAAACTGTQGCTATA 

CCTACCGATTX^i iiaGGAAACACCCTGA^ 

CACCAGACACGCTGACGTCATCCerc^^ 

GftTQCCACAAGCATTCTGQGAATCX» 
ATACOTCCCC G XAAGCGAT GCCGCro CCAGAGTC^^ 

ACAltXXX ^ TTOGCTGACCTOOCXXXTOACCAAAGG ^ 
TGCCAAAAGCGTCTOCGGACCCGTCTACTG^ 
COGAACAGTTTAAGCAAAACCXrrCTCCGACrGCTCCAGACAT^ 
CCTAQCTQGQACCAAATCTGGAAGrGTCTGATTA 

AGCTCCTGAGAAGGCTCCACCAATGGATT^ 

GATGGCTCCTGGTCCACCGTCAGCTCOGACGCTQGCACAGACG A Tg 

ATGGGATCAGATGTGGAAATGCCTCATCAGACTGAAACCCACACTt^ 

CXXn*CCAGAATCTGGCTGAGCAATTCAAACAG 

CCTCCaX^tnGCAAACCAATTQ<XaAAAGCTCGA^ 

ATACCTOGOaMACTCmxaCCCTCrCC^ 

TGCCTGACTTCXGACGCTGCCXXTTAGGGTCA^ 

GGAAGCGTCTTCCTCGTQCX3ACAG 

ATGGGCTG0GTGGCT6CCCAAC 
ATlXUCTCOGTGiGGAAGCXGGCAC 
ATTAACTCCACC GC TCT C AATTCCAATCtf^ I rTTA CCAACACAAATTCAA 

T**CGCTCTCrrCC*KCTVCL m rW 

QTCAA tfrritlCXJUjA GGCGGAAGCCAAACCA 
CTKXCTAOOGCTCAGGCTCXgCCTOC^^ 
ACAATACCA CACCOCC TCTOQGAAACTGgr 

ACAGAAAAAGGTCACCTTTCACAGACTGCAAGTGCT 

grCACCGGAATGACAACCGATAACCTCAAGTCT^^ 

CCTGAAACTGACACCCATTGCCGCrGCCGGAAQGCTC^ 

TCTATCACTCC G CCTCCACGCAACCCGAACTG ^ 

QCTAACCATATGTGGAACTTTTGCAGAGCCTCCQ GO grCCTGACA^ 

ACCCAGAGOOGCTltXrAGAGOOGCTtSGCCTCTTO^ 

AGAAGGTGTAG6GCTAGCGGA<ntXrrcACCACAAGCTCTGGO 
TGAGATTACXCGACACCrrcAAGAATGG 

GOCTCCACGAATACCCTGTGGGAACCCAACTGCCTTGOGAA^^ 

AGCSCAGAGACGCTgrGATTCT GC TC A Tgl^ 
TU"riTCCCCCTATCCTCAOa»T^^ 

rccATcccnvacrtccfJSCGcr^ 

TGGCCATGCCGTCGGCATTTTCAGAGCOGCTGACTTT^ 
CCGATCAGAGACC CTATTG CTGGCACTATCCCCCTA 

AATAGGCTCATv^JI rTCGCTAGCAGAGGCAATCACGTOWCXXCTACCCATctcgagtgagaatt cgcc 
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Cassette C 

ggcggatccaccatgct cgagTGCCTCTGGATGATXXTTCCrGATTAGCCAAGCCGAA^ 
TCTGAATGCCGCTAGCCTCGCCGGAACO^ 
AGTCTACXXAACACCTCCCCTATAT^^ 
CTCTA0GGAGTGGGAAGCrCCATCGCTAGCIX3<XSCCAT^ 

GCKTITACATGAGaUUttCCC^^ 

eracTKrraGCAITT^ 

TACXiAGATrCOCTCAOSCTCTCCCTXaCT 
JCTATGAQCCTACOG Cro OOOUUi CC T TT C ^ 
ACAAG&ACAATCGCTAGCCCTTGGQCTGACAA^^ 
AATGGAAACCAAACTGATTACCTCQSG^ 

GAAGAGGATGAGAGAGAGATTAGCGTCCCCGCTGAGATT^ 



CCAAftaC COTgGA tTTTATXXCTC^^ 

ATTTUUSCTGTOOCTTACTATAGOGGACTGGATGltrrCOGTGAn 
TATCTCCGCCGATCTQGAACTCX^ 
TCTCC3^CCGG»GCCCTCATGACAGGCrAT 
OTgGAtTTTAGCCTCGUlCCCTAACACftAAC^ 

MmmactQCTococMaMGBGGHaoa^ 

OCACAOGCAATCTGCCTGGCTGT A GCTTTACC ATT 1 TOCl LA GOUU^TTCG Q ATACOGAGCCAAACACCTCAGGTGTC 
GCrAGGAAAOCOGTC G CCC A TATCAATAGCGTCTGG 

GCCTCTTCTATCAGCATAAGTTTAACTCCAGCGGATG^ 

CTCTTCCTCCTGCTOtXXX^TGCCAGAGTGTGTflCCT 

CGACTGTGAGAITiACGGAGCVll^rr^ 

GCGCriTCTCCTGGACAGTGTATCAOGGACXXXXSAACCAGAA 

ACAAACOTgGAtOVAGACCTCl^CftGATTCGT^^ 

TGJUntrTTACGATGCa^TGCGCTP^ 

CATASGCCTAGGTCOGTCAGAGCCAGACTGCTO GC CAGAQ 

CCCTCCrGAAA GTGCC T T A CTTTCTtS AGAGTCCAAGGCC^ 

CX^CCATGACGATTOTGGGACCCAGAACCTGT^^ 

AGSAGGTCGCCCTCAGCACAACOGGAGAGATTCCCT 

ACCK ^iv r r i iritj tfaaiGaGRccc^ 

TCCIQ6CTOG(XAATATCATTA&2GTCAGC(X3^ 
AGGCATGACCACAGACAATCIGAAATCOCCTCCOCn^O^ 
CCCCTCCCAGAAAGAAAAQQACAGTQCTCCTG^ 
TTTGGCTCCAGCTCCACCTCCGGCATTACC(& 
CTGGAGAGGCGATGGCATTATGCAT^^ 
TGTTTACCI X TJIOCXXTIMXAGMCACIOQ 
GCTCXXXTCC&GCXSMXXXXrXR^ 

AOAC A1 1 I GGGA TTGG A 'rr n AJU AACTG CTCAGCGAT^ 
GCATIt^TlTl'AACXCCMGCAlTtW 

ATCXXXSAOCXIATTAGCTATGCCAATOCCTCCA^ 

GGGTOG^X^TTAAGTOCCT O ACACAGAGACT 

TATGCCCAACAGACAAGGQGACTGCTO 

ACTGGAACAGGCTCTGGATTGCGAAATCTATGGCGCTTGCT 

AGTTTTTCACKGAGCTCGAOGGAGTGAGACTCCATAGGTTTGCCCCT^ 

ATTAAGGCTAGCGCTCCCTGTAGGGCTtXXXSGACTO 

TATGGATCCCAATATCAGAACOHSAGltjAG^ 

TGGCTGACGGATGCAATTGGACAAGGGGAGAGAGATGCGATC^ 

CTCAGCAOACCCAATGGCA^ 

ACGTCGGCCCT0G0GAA0606CIGTCCAATG6ATGAAO 
CTCXGGAGAGAGGAAATQGGACX&AATATCACM 

TCCCTAAGGCTAGGACSACC06AA0GCAGAACCTGG6CCCAA 
TTTOCO GATCIGGGACTCASAGTOTOI^ 

GCGCrrACCAACTGCGAAAXXrrCCTGGTgGAtAT'^ 

AAAATCATGAGCGGAGACGTCAGCTATAGCTCCATGCCTCCCC^ 

AAGCTGGAGCACAGTGTCCAGCGAAGC^ 

TCXFTCATCCTGCACTCCTTCGATCCCCTGGTRCT 

CCTIGCACATGOGXAAGCTCOGACCTCTACCKX?^ 

TATCTGTGACGAATGCCATAGCACAGACGCTACC^ 

CXACACTOOCTC A OGATGOOGTCAGgCAACC 

AGAAAQCXTCCOGAAGCCACATACTCCAOGTGTGGCTC^ 
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AACTGCCTATCAATGCCCTCAGCAAT^ 

lOTCCGGCTCCTGGCTCAGGGATATCTCGG^ 

GCTCATGCCTCAGCTtXTCCGGAATCCCrrT^ 

GACCCTCX^TCACACCCAGATGCCTOGTG^ 

TTCAAAagatctTGAgtcgacgaattcgcc 
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M lanoma Savine design 

Two savines - one containing scrambled melanocyte differentiation Age 
- one containing scrambled melanoma cancer specific Ags 

Genes in melanocyte differentiation Savine 

gplOO 

MDLVLKRCTJJIIAVIGALIAW^ 

GANAS FS IALNFPGSQKVLPDGQVI WVNNTI INGSQVWGGQPVYPQETDDAC I FPDGGPCPSGS WSQKRS FVYVWKTW 
GQYWQVLGG PVSGLS I GTGRAMLGTHTMEVTVYHRRGSRS YVPLAHSS SAFTI TDQVPFS VS VSQLRALDGGNKHFLR 
NQPLTFAIjQIjHDPSGYIJUJADLSYTWDF 
HRPTAEAPNTTAGQVPTTEWGTTPGQAPTABPSGT^ 

EMSTPEATGMTPAEVS IVVI*SGTTAAQVTTTBWVETTARELPI PBPBGPDASSIMSTBSITGSLGPLLDGTATLRLVK 
RQVPLIX?VLYRYGSFSVTIJDIV(XJIESAEILQAVPSGEGDA 

SPACQLVLHQILKGGSGTYCLNVSIiADTNSLAVVSTQLI MPGQEAGLGQVPL IVGIXJ^VLMAVVLASLI YRRRLMKQD 
FSVPQLPHSSSHWLRLPRIFCSCPIGENSPLLSGQQV 

MART 

MPREDAHF I YGYPKKGHGHS YTTAEEAAGIG ILTVIIXJVL^ 
FDHRDSKVSLQEKNCEPWPNAPPAYEKLSAEQSPPPYSP 

TRP-l 

PAFLTWHRYHLLRIiEKDMQEMLQEPSFS I S PNSVFSQWRWCDSLED 

YDTLGTLCNSTEDGPIRJW^AGNVARPMVQRIjPEPQDVAQ 

RSLHNIiAHLFLNGTGGQTHLS SQDPI FVLLHTFTDAVFDEWLRRYNAD ISTFPLI^APIGHH^ 
MFVTAPDNLGYTYE 

Tyros 

MLLAVLYCIXWSFQTSAGHFPRACVSSKNIJfEKEC^^ 

SWPSVFTfHRTCQCSGNFMGFNCGNCKFGFWGPMCraRRI^ 

GQMKNGSTPMFNDINIYDLFVWMHYYVSMDALIX^SBIWRDIDFAHBAP 

YWDWRDAEKOTICTDEYMGGQHPTNPNLI*SPASFFS 

PSSADVEFCI^LTQYESGSMDKAANFSFRNTLBGFASPLTGIA^ 

AFVDS IFBQWLQRHRPLQEVYPBANAPIGHNRES YMVPFIPLYRNGDFFI SSKDLGYDYSYLQDSDPDSFQDYIKS YI» 
EQASRIWSW^LGAAMVGAVLTALLAGLVSI*^^ 

TRP2 

MSPLHWGFLLSCLGCKILPGAQGQFPRVCMTVDSLVNKECCPR 

NQDDRBLWPR KF FHRTCKCTGN7 AG YNCGIXTKFGWTG PN CERKKP PV I RQN I HS LS PQEREQ FLGALDLAKKR VH PD Y 

VITTQHWLGLLGPNGTQPQFANCSVYDFFVVr^^ 

IGNBSFAI^YWNFATCRNECDVCTDQLFGAARPDDPT^^ 

GRNSMKLPTLlKDIRDCLSLQKFDNPPFFQN 

IFVVIJ1SFTDAIFDEVWKRFNPPADAWPQBIAPIGHNRMYNMVPFFP 
WPTIXLVVMGTLVALVGLFVIJJ^LQYRRIJIKGYTPI^ 

MC1R 

MAVQGSQRRUjGS LNSTPTAI PQLGLAANQTGARCLBVS I SDGLFLSLGLVSLVENALWATI AKNRNLHSPMYCFIC 
CIJaSDLLVSGTNVLETAVII^EAGALVARAAV^ S I FYALRYHS IV 

TLPRAPRAVAAIWVASVVFSTLPIAYYDHVAVLLC^ 

FGLKGAVTLTILLGIFFLCWGPFFLHLTLIVLCPEHPTCGCIFK^ ICKAI IDPLIYAFHSQBLRRTLKBV 

LTCSW 

MUC1F 

MTPGTQS PFFIJ^LIXTVLTVVTGSGHASSTPGGBKETSATQRSSVPSSTEKNAVSMTS S VLS SHS PGSGS STTQGQDV 
TIiAPATEPASGSAATWGQDVTSVPVTRPALGSTTPPAHDVTSAPDNK 
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MUC1R 

NKPALGSTAPPVHNVTSASGSASGSASTLVHNGTSA 

VPPLTSSNHSTSPQLSTGVSFFFIjSFHISNIjQFNSSLEDPSTDYYQEI^ 

VVQLTLAFREGTIKVHDVirrQFNQYinTZAASRY^ 

LIAIAVCQCRRKNYGQLDIFPARDTYHPMSKYPTYHTHGRYVPPSSTD 



NB Muc 1 Repeat sequences in the middle of the gene were removed 



Genes in melanoma specific Savin© 

BAGE 

MAARAVFLALS AQLLQARLMKEES PWS WRLEPEDGTALCF I F 
GAGB-1 

MSTOGRSTyRPRPRRYVBPPEMIGPMRPEQFSDBVBPATPEEGBPATQRQDPAAAQEGEDEGASAG^PKPEADSQ^ 
GHPQTGCECEDGPDGQEMDPPNPBBvTCTPEBEMRSHYW 

gpl00In4 

SWSQKRSFVYVWKTWGBGLPSQPI IHTCVYFFLPDHLSFGRPFHLNFCDFI* 
MAGE-1 

HSLEQRSLHCKPEBALEAQQBALGLVCVQAATSSSSPLVIiGTLEEVPTAGSTDPPQSP 

GSSSREBEGPSTSCIIiESLFRAV^TKKVADLVGFLLL^ 

IDVXEADPTGHSYVLVTCLGLSYIX3IiLCT^ 

PRKLLTQDLVQEKYIJSYRQVPDSDPARYEFLWGPRAIAETSYVXVLEYVTKV 
MAGB-3 

MPLEQRSQHCKPEEGIiEARGEALGLVGAQAPATBEQEAAS SSSTLVEVTLGBW PQGAS S LPTTMNYP 

LWSQSYEDSSNQEEBGPSTFPDIJBSEFXJAALSRXVABLVHFLI^ 

SLQLVTGIELMEVDPIGHLYIFATCLGLSYIX3LIiGDNQIMPKAGLL^ IVLAI IAREGDCAPBEKIWEEI*S VXJSVFEGR 

EDSILGDPIOaJjTQHFVQENYI^YRQVPGSDPACYB^^ 

BE 

PRANB 

MERRRLWGSIQSRYISMSVVrrSPRRLVBLAGQSLLKDBAIAIAALB 

TOiPLGVLMKGQHLHLETFKAVIiIXni^ 

KKRKVDGLSTEAEQPFIPVEVLVDLFLKEGACDI^ 

IEDLEVTCTWKLPTLAKPSPYIX^IMJ^ 

LDQLLRHVMNPLETLSITNCRI^EGDVM^ 

TDDQLLALLPSLSHCSQLTTLSFYGNS I S I SALQSLLQHLIGLSNLTHVliYPVPLES YEM 
EIJXBIX3RPSMVWLSANPCPHCGDRTFYDPEPILCPCFMPN 

TRP2IN2 

LMBTHLSSKRYTEEAGGFFPV^KVYYYRFVIGLRWQW^ 
NYNSOla 

MQAEGRGTGGSTGDADGPGGPG I PDGPGGNAGGPGEAGATGGRGPRGAGAARASG PGGGAPRGPHGGAASGLNGCCRC 

GARGPESRLLEFYIiAMPFATPMEABIiARRSIiAQDAPPLPVPGVI^ 

SLLMWITQCFLPVFLAQPPSGQRR 

NYNSOlb 

MLMAQEAIiAFLMAQGAMLAAQERRVPRAAEVP^ 
LAGB1 
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MQAEGQ(nKK3STGDATCP«3PGI PDGPG^ 

GARRPDSRLlX3IiHITMPFSSPMEAKLVRRILSIU)AAPLPRPGAVLKDFTVSGN^ 
SLLMWITQCFLPVFLAQAPSGQRR 



Differentiation Savine Scramble process 



Disease name : melanoma 

Input filename : Diffnucg.txt 

Output filename : Diffmucs.txt 

Hunber genes : 8 

number segments : 187 

Segment length : 30 

Segment overlap : 15 

Segments in original order: 



: gplOO 
Segment* : 1 
Offset : 1 
1st COdon : 1 

AAMDLVLKRCLLHLAVZGALLAVGATK 
GCOCCTATGGATCTGCTCC ro AAAAGGTGTCTB C TCCACCTO^ 



V P R 



Gene : gplOO 

Segment* : 2 
Offset : 16 
1st COdon i 1 

VZOALLAVG-ATX 
GTGAT 



VPRHQDWLGVSRQLRTKA 



Gene : gplOO 

Segnent* : 3 
Offset : 31 
1st COdon : 1 

N Q P * L G _Y__ S RQLRTKAMHR 
AACCAAGACTGGCTGGGAGTGTCCAGGCAACrGAGAACCAAAGCCT 



QLYPBWTBAQRL 



Gene : gplOO 

Segment t : 4 
Offset : 46 
1st Codon : 1 

WHRQLYPB 
TGGAATAGGCAACTGTAT 



WTBAQRLDCMRGGQVSLKVSHD 



Gene r gplOO 

Segment* : 5 
Offset : 61 
1st Codon t 1 

DCVRGGQVSLXVSHDGPTLIGANASPSZAL 
GACTGTKXyWGAGGCGGACAGGTCAGCCTCAAG^ 

Gene : gplOO 

Segment* : 6 
Offset : 76 
1st Codon : 1 

GP TLIGAH 
GGCCCTACCCTCATCGGAGCCAAT 



VLPDGQ VZ 
ITGGCCAAGTGATT 



ASFSIALWPPGSQK 



Gene : gplOO 

Segment* : 7 
Offset : 91 
1st Codon : 1 

HFPGSQKVLPDGQV ZWVHNTI IHGSQVWGG 
AA l T r i UXXX SAAGCCAAAAGGlIXT ^ 



Gene : gplOO 

Segment* : 8 

ffset : 106 

1st Codon : 1 
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WVNHTIINGSQVHGGQPVYPQBTDDACIPP 
TGGGTCAACAATACCATTATCAATGGCTCCCAGGTCTGGGGAG^ 

Gene : gplOO 

Seyuent* 9 
Offset : 121 
1st Codon : 1 

QPVYPQ8TDDACI PPOGGPCPSGS 
CAGCCTGTGTATCCCCAAGAGACAGACGATGCCTGTAT 



3 Q K R S 



Gene : gplOO 

Segment* t 10 
Offset : 136 
1st COdon : 1 

D G G P C PSGSWSQKRSFVYVWKTWGQYWQVL 
GAOGGAGGCCCTTGCCCTAGOGGAAGCTGGAGCCAAAAGAGAAGCTr^ 

Gene : gplOO 

Segment f : 11 
Offset : 151 
1st Codes : 1 

FVYVWKTWGQYWQVLGGPVSGLSIGTGRAM 
TTCGTCTACGTCTGGAAAACCTGGGGCCAATACTGGCAGC 



: gplOO 
Segment* : 12 
Offset : 166 
1st Codon : 1 

GGPVSGLSXGTGRAMLGTHTMBVTVYHRRG 

Gene : gplOO 

Segment* : 13 
Offset : 181 
1st Codon t 1 

LGTHTHBVTVYHRRGSRSYV PLAHSSSAFT 
CTGGGAACCCATACCATGGAGGTCACCGTCTACCAT^ 



: gplOO 

Segment* : 14 
Offset s 196 
1st COdon : 1 

SRSYVPLAHSSSAPTITDQVPPSVSVSQLR 
AGCAGAAGCTATGT GO CTCFGGCTCACTCCACCrCCGCCTTTA^ 



: gplOO 
Segment* : 15 
Offset : 211 
1st Codon : 1 

ITDQVPrSVSVSQLRALDGGNXHFLRNQPL 
ATCACAGACCAAGTGCCrTlCrCCGTGTOT 

Gene : gplOO 

Segment* : 16 
Offset : 226 
let Codon i 1 

A L DGGNKHPLRWQPLTPALQLHDPSGYLAB 
GCCCTCGACGGAGGCAATAAGCA'lXlXX"ltJIGG\ATCA^ 

Gene : gplOO 

Segment* : 17 
Offset : 241 
1st Codon : 1 

TPALQLHDPSGYLABADLSYTWDFG DSSGT 
ACCTTTGCCCTCCAGCTCCACGATCCCTC 

Gene : gplOO 

Segment* : 18 
Offset t 256 
1st Codon : 1 

ADLSYTWDFGDSSGTLISRALVVTHTYLBP 
GCCGATCTGTCCTACACATGGGATTTCGGAGACTCCAGOT 
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Gene : gplOO 

Segaenti : 19 
Offset : 271 
let Codon : 1 

LI SRALVVTHTYLB PG PVTAQVVLQAA X PL 
CTGATTACCAGACCXXrrCQTGGTOVCOC3VTACCT 

Gene : gplOO 

Segment # : 20 
Offset : 286 
1st Codon : 1 

GPVTAQVVLQ 
GGCCCTCntSACAGCCCAAGTGGTOCTG 

Gene : gplOO 

Segaentft : 21 
Offset : 301 
1st Codon : 1 

TSCGSSPVPGTTDGHR PTABAPHTTAGQVP 

Gene : gplOO 

Segment* : 22 
Offset : 316" 
1st Codon : 1 

RPTA8APNTTAGQVPTTBVVGTTPGQAPTA 
AGGCCTACOGCIGAGGCTCOCAATACCACAGCOG^ 

Gene : gplOO 

Segsent* : 23 
Offset : 331 

1st Codon : 1 

TTBVVGTT PGQAPTAB PSGTTSVQVPTTBV 
AOCACAGAGGTC(aqGGAACCACAOCaX» 

Gene t gplOO 

Segment* : 24 
Offset i 346 
1st Codon t 1 

BPSGTTSVQVPTTBVI STAPVQMPTABSTG 
GACCCTAGaSGAACCACAAGCGTCCAQGTa^ 

Gene : gplOO 

Segment* i 25 
Offset : 361 
1st Codon : 1 

ISTA PVQM PTABSTGMT PBKVPVS BVMGTT 
ATCKXXCXGCTCCXCTCCAGATCCCCACAOCCGA^ 

Gene : gplOO 

Segnenti : 26 
Offset s 376 
1st Codon : 1 

MTPBKVPVSBVMGTTLABMS TPBATGMTPA 
ATCACAOCCGAAAAOGTCCCOCICACCGAAGTGATGGGC^ 

Gene : gplOO 

Segmenti : 27 
Offset : 391 
1st Codon t 1 

L A B M 8 T P B ATGMTPABVSIVVLSGTTAAQV 
CTGGCTGAGATGAGCACACXrCGAAGCCACAGGCA 

Gene : gplOO 

Segnenti : 28 
Offset : 406 
1st Codon : 1 

BVS IVVLSGTTAAQVTTTBtfVBTTARB LP I 
Gene : gplOO 
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Segment* : 29 
Offset : 421 
1st Codon : 1 

TTTBWVBTTARBLPI PEPBGPDASS I M S T E 
ACCACAACCGAATGGGTCGAGACAACCGCTAGGGAACTGCCT^ 

Gene : gplOO 

Segment* : 30 
Offset : 436 
1st Codon : 1 

PBPBGPDASSI MSTESITGSLGPLZjDGTAT 
CCCGAACCCGAAGGCCCTGACGCTAGCTCCATCATGAG 

Gene : gplOO 

Segment* : 31 
Offset t 451 
1st Codon : 1 

SITGSL GPLLDGTATLRLVKRQVPLDCVLY 

Gene : gplOO 

Segment* : 32 
Offset : 466 
1st Codon s 1 

LRLVKRQVPLDCVLYRYGSFSVTLDIVQGI 
CTGAGACTGGTCAAGAGACAGGTCCCCCTCGACTG 

Gene : gplOO 

Segment* : 33 
Offset : 481 
1st Codon : 1 

RYGSFSVTLDIVQG I BSAKILQAV PSGBGD 

Gene : gplOO 

Segment* : 34 
Offset : 496 
1st Codon : 1 

BSABI LQAVPSGBGDAPELTVSCQGGLPJCB 
CACTCOGCCGAAATCCTCCAG GC TCTG C CTAC^ 

Gene : gplOO 

Segment* : 35 
Offset : 511 
1st Codon : 1 

A F B LTV8CQGGLPJCBACMBI SSPGCQPPAQ 
GCCTTTGAGCTCACCGTCAGCnrrCAG 

Gene : gplOO 

Segment* : 36 
Offset : 526 
let Codon : 1 

ACMEISSPGCQPPAQRLCQPVLPS F A C Q L V 
GCCTGTATGGAAATCTCCACCCCTGGCTGTCA^ 

Gene : gplOO 

Segment* : 37 
Offset : 541 
1st Codon : 1 

R1>CQPVLPSPACQL»VLHQI LK G G S G T Y C L N 
AGCCTCTGCCAACCCGTCCT GC CT A OCCCTGCCTQTCAG 

Gene : gplOO 

Segment* : 38 
Offset : 556 
1st Codon : 1 

L H Q I LKGGSGTYCLNVSLADTNSLAVVSTQ 
CTGCATCAGATTCTGAAAGGCQGAAGCGGAACCT^^ 

Gene : gplOO 

Segment* : 39 
Offset : 571 
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1st Codon : 1 
V S L A D 



T H S h A V V 



STQLIMPGQBAGLGQVPL 
^TCATGCCCGGACAGGAAiGCCGGACTGGGACAGGTCCCCCTC 



Gene : gplOO 

Segments : 40 
Offset : 566 
lat Codon : X 

L I M P G Q 
CTGATTAT 



BAGLGQVP 



I* I V G I LLVLMAVVLAS 
\TTGTGGGJ^TCC T CCTt»TCttt^TGCCCtriXX? 



Gene 
Segment* 
Offset 
1st Codon 

I V O I 
ATCGTCCGCA' 



gpioo 

41 

601 
1 

LLVLMAVVLASLIYRRRLMKQDPSVP 

VTCTATAGGAGAAGGCTCATGAAACAGGATTTCTCCG^ 



Gene : gplOO 

Segment* : 42 
Offset : 616 
1st Codon : 1 

LIYRRRLHKQDPSVPQLPHSSSHWLRLPRI 
CTGATTrAOU»AGGAGACTGATCAAGCAAX3ACT 

Gene : gplOO 

Segment* : 43 
Offset : 631 
1st Codon : 1 

QLPHSSSHWLRLPRI PCSCPIGBNS 
CAGCTCCCCOITAGCTCCAGCCATICGCTCAGGC^^ 

Gene : gplOO 

Segment* : 44 
Offset : 646 
1st Codon : 1 

PCSCPIGBHSPLLSG 
ritlTOI' A CCTGTCCX^TTGCCGAAAACTCCCXC^ 



P L L S G 



Q Q V A A 



Segment* t 1 
Offset : 1 
1st Codon : 1 
AAMPRBDAH 



IYGYPKKGHGHSYTTABBAA 
ATCCCAAAAAGGGACAOGGACACTCCTACACAACCCCTGAGGAAG 



Gene : MART 

Segment* : 2 
Offset : 16 
1st Codon : 1 

KKGHGH SYTTABBAAGIGI L T V 1 
AAGAAAGGCCA1GGCCATAGCTATACCACA<X?CGAAGAGG 



Gene : MART 

Segment* : 3 
Offset i 31 
1st Codon : 1 

GIGILTVILGVLLLIGCWYCRRRNGYRALM 
GGCATTGGCATTCTGACAGTGATTLTGGG 

Gene : MART 

Segment* : 4 
Offset : 46 
1st Codon : 1 

GCWYCRRRNGYRALMDKS LHVGTQCALTRR 
GGCTCrTGGTATTGCAGAAOGACAAACGGATACAGA 

Gene : MART 

Segment* t S 

ffset : 61 
1st Codon : 1 

DKSLHVGTQCALTRRCPQBGPDHRDSKVSL 
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GACAAAAGCCTCCACGTCGGCACACAGTGTGCCCTCAC^ 

Gene : HART 

Segment* : 6 
Offset : 76 
1st Codon : 1 

CPQBGFDHRDSKVSLQBKHCBPVVPNAPPA 

TCCCCTCACGAAGGCTrTGACCATAGGCATAGCAAACTC^ 

Gene : HART 

Segment* : 7 
Offset : 91 
1st Codoo : 1 
QBXMCBPVV 

Gene : HART 

Segment* : 6 
Offset: : 106 
1st Codon : 1 

YBJCLSABQSPPPYSPAA 
TAC 

Gene : TRP-1 

Segment* : 1 
Offset : 1 
1st Codon : 1 

A A, P * P L LJL H R Y F L R LB KDHQB H h Q B P S , P S 

CCCGCTttXBCTTTOCTCACCTGCC^^ 

Gene : TRP-1 

Segment* : 2 
Offset : 16 
1st Codon : 1 

LB KDHQBMLQBPSFSLPYMHFATGKMVCDI 
CTGGAAAAGGATATGCAAGACATGCTGCAAGAGCCTAGCTTT^ 

Gene : TRP-1 

Segment* : 3 
Offset : 31 
1st Codon : 1 

L P Y " M p atgkmvcdictdd'lh gsrsmfdst 

CTGCXTTTACrGGAACTTTGCCACAGGCAAA^ 

Gene : TRP-1 

Segment* : 4 
Offset : 46 
1st Codon : 1 

CTDDLHGSRSHPDSTLISPHSVFSQWRVVC 
TGCACAGAOGATCTGATGGGCTCCAGGTCCAACr^ 



Gene : TRP-1 

Segment* : 5 
Offset : 61 
1st Codon : 1 

L I S P M S V F S QWRVVCDSLBDYDTLGTLCHS 
CTGATTAGCCCTAAL 1 CCGTGTTTAGCCAATGGAGAtSTGGTCT 

Gene : TRP-1 

Segment* : 6 
Offset : 76 
1st Codon : 1 

DSLBDYDTLGTLCNSTBDGPIRRHPAGNVA 
GACTCCCTGGAAGACTATGACACACIGGGAACCCTCTG 

Gene : TRP-1 

Segment* : 7 
Offset : 91 
1st Codon : 1 

TBDGPI RR H P A GHVARPHVQRLPBPQDVAQ 
ACCGAAGACGGAOTCATTAGGAGAAACCCTGCCGGAAA 
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Segmentt : 8 
Offset : 106 
1st Cod on : 1 

RPMVQRLPBPQDVAQCLBVGLFDTPPFYSN 
AGGCCTATGCTCCAGAGACTGCCTGACCCTCAGGATC^ 

Gene : TRP-1 

Segment # : 9 
Offset : 121 
1st Codon : 1 

CLBV6LFDTP PPYSNSTHSPRNTVBGYSDP 

TGCCTCCACXTTCGGCCrCTrCGATACCC^^ 

Gene : TRP-1 

Segment • : 10 
Offset : 136 
1st Codon : 1 

STH SPRNTVBGYSD PTGKYD PAVRS L H N h A 
AGCACAAACTCCTTCAGAAACWAGTGGAAGCCT 

Gene : TRP-1 

Segmentt : 11 
Offset : 151 
1st Codon : 1 

T G JC Y D P A V RSLHNLAHLPLMGTGCQTHLSS 
ACCGGAAAGTATGACCCTGCCGTOUSGTCC^ 

Gene : TRP-1 

Segment* : 12 
Offset : 166 
1st Codon : 1 

HLFLNGTGGQTHLS3QDPIFVLLHTPTDAV 
CACCTCTTCCTCAAGGGAACCGGAGGCCAAACCCATC1T»^ 

Gene : TRP-1 

Segment* : 13 
Offset : 181 
1st Codon : 1 

QDPIPVLLHTFTDAVPDBHLRRYNADISTF 
CAGGATCCCATTTTCGTCCTGCTCCACAC^TT^ 

Gene : TRP-1 

Segmentt : 14 
Offset : 196 
1st Codon : 1 

FDBWLRRYMADI STPPLBNAPIGHH RQYHM 
TTOGATGAGTGGCTGAGAAGGTATAACGCIXSACATTAGCA 

Gene : TRP-1 

Segment! : IS 
Offset : 211 
1st Codon : 1 

PLBHAPI. GHHRQYNMVPFHPFVTNTBMFVT 
CCCCTXXSAGAATGCCCCTATCGGACACAATAGGCAATACAA^ 

Gene : TRP-1 

Segmentt : 16 
Offset : 226 
1st Codon : 1 

VPFWPPVTNTBHFVTAPDMLGYTYBAA 
QTCCCTTTCTGGCCCCCTQTGACAAACACAGAC ATGT ^ 

Gene : Tyros 

Segmentt : 1 
Offset : 1 
1st Codon : 1 

AAMLLAVLYCLLWS FQTSAGHF PRACVSSK 
CCCGCTATtCTOTGGCTGTGCTCTACl^ 



Gene : Tyros 

Segmentt : 2 
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Offset : IS 
1st Codoa : 1 

QTSAGHFPR A C VSSKNLMEKECCPPW3GDR 
CAGACAAGCGCTGGCCATTTCCCTAGG^ 

Gene : Tyros 

Segment I : 3 
Offset : 31 
1st Codon : 1 

HLMBKBCCPPWSGDRSPCGQLSORGSCQNI 
AACCrCATGG A AAAGGAATGCrGTCCCCCTTGGTCCG GC GATAGGT^ 

Gene : Tyros 

Segnenti : 4 
Offset : 46 
1st Codon ; 1 

SPCGQLSGRGSCQHI LLSKAPLGPQFPPTG 
AGC lX ' riXJO GGACAGCTCAGCGGAAGGGGAAGC TGTCAGAATAT 

Gene : Tyros 

Segment # i 5 
Offset t 61 
1st Codon : 1 

LLSHA PLG PQFPPTGVDDRB SW PSVPYNRT 
CTGCTCAGCAATGCCCCTCTGGGACCCXlA ATTCCCrT T CA ^ 

Gene : Tyros 

Segaenti : 6 
Offset t 76 
1st Codon : X 

VDDR8SWPSVFYHRTCQCSGNFMGFNCGNC 
GTGGATGACAGAGAGTCCTGGCCTAGCGTCTTC^ 

Gene : Tyros 

Segnenti : 7 
Offset : 91 
1st Codon i 1 

C Q CSC U P M G F If CO HCKPG PWGPNCTBRRLL 

TGOCAATGCTCCGGCAATTTCATGGGCXTTAACTGTGG 

Gene : Tyros 

Segeentt s 8 

Offset : 106 

1st Codon t 1 

k r g r ir g 

Gene : Tyros 

Sesnenti ; 9 
Offset : 121 
1st Codon : 1 

V R R N I P DLSAPBKDKFPAYLTLAKHTISSD 
GTGAGAAGGAATATCTTTGACCTCAGCGCTCCCG 

Gene : Tyros 

Segnenti : 10 
Offset : 136 
1st Codon : 1 

P F AYLTLAKHTISSDYVI PIGTYGQMKNGS 
TTCTTTGCCTATCTGACACTGGCTAAGCATA 

Gene : Tyros 

Segment* : 11 
Offset : 151 
1st Codon : 1 

Y V I P I G T Y GQMKNGSTPMPNDINI YDLFVH 
TAam^TCCCTATCGGAACCTATGGCCAAATGAAAAACGGA 

Gene : Tyros 

Segment! : 12 

Offset : 166 

1st Codon : 1 
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TPMPNDZNZYDLFVNMHYYVSMDALLGGSB 
ACCCCTATCTTTAACGAT ATCAATATCTATT^CCT<^^ CGAA 

Gene : Tyros 

Segment* : 13 
Offset : 181 
1st Codon : 1 

H H Y Y V 3 M DALLGGSE Z H R D 1 DFAHBAPAFL 
AT<X!ATTACTATgl , GTCCATtX^TGCCCTtXTO 

Gene : Tyros 

Segments : 14 
Offset i 196 
1st Codon : 1 

INRDIDFAHBAPAPLPNHRLFLLRHBQBIQ 

ATCTQQAGGGATATCGATTT CGC T CA OGAAGC C CCTOCCr 

Gene : Tyros 

Segment # : 15 
Offset : 211 
1st Codon : 1 

PWHRLFLLRWBQB IQKLTGDBNFT Z PYMDH 

CCCTGGCACAGACTGTTTCTGCTCAGGTGGGA 

Gene : Tyros 

Segments : 16 
Offset : 226 
1st Codon : 1 

KLTGDBNFTIPYHDNRDABKCDICTDBYMG 



Gene : Tyros 

Segmentf : 17 
Offset : 241 
1st Codon : 1 

R D A B K CD I C TDBYHGGQHPTNPNLLSPASF 
AGGGATGCCX3AAAAGTOTQACATTTGCACAGACGAATACA 

Gene r Tyros 

Segments x 18 
Offset : 256 
1st Codon : 1 

GQBPTMPNX.LSPASFFSSHQZVCSRLBB.YN 
QGCCAACACCCTACCAATCCCAATCTGClCAG C CC TGC C' r 



Segments : 19 
Offset : 271 
1st Codon : 1 

PSSNQZVCSRLBBYNSHQSLCMGTPBGPLR 
TTCTCCAGCTGGCAG A TTGTtyrGTAGCAGACTGGA^ 

Gene t Tyros 

Segments : 20 
Offset : 286 
1st COdon : 1 

SHQSLCNGTPBGPLRRNPGNHDKSRTPRLP 
AGCCATCACTCCCIGTCTAAOQGAACCCXrTGA 

Gene : Tyros 

Segments : 21 
Offset : 301 
1st Codon : 1 

RNPGNHDKSRTPRLPSSADVBFCLSLTQYB 
AGGAATCCCGGAAACCATCACAAAAGCACAACCCCTAGCCT^ 

Gene : Tyros 

Segments : 22 
Offset : 316 
1st Codon : 1 

SSADVBFCL6LTQYBSGSMDKAANFSFRNT 

AGCTCOGCCGATGTGGAATTCTGTCTGTCCCTC 
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Gene : Tyros 

Segment! : 23 
Offset : 331 
1st Cod on : 1 

SGSMDKAANPSPRHTLBGPASPLTGIADAS 
AGOGGAAGCATGGACAAAGCCGCTAACTTTAGC^ 

Gene : Tyros 

Segment! : 24 
Offset : 346 
1st Codoo : X 

LBGFA3PLTGIADASQSSHHHALH I Y M N G T 

CTGGAAGGCTTTGCCTCCCCCCTCACCG^ 

Gene : Tyros 

Segment! : 25 
Offset : 361 
1st Codon : 1 

QSSMHNALHI YMHGTMSQVQGSAHDPIFLL 
CAGTCCAGCATGCACAATGCCCTCCACATTTA 

Gene ; Tyros 

Segment! : 26 
Offset : 376 
1st Codon : 1 

MSQVQGSAMDPIPLLHHAPVDS IPBQHLQR 
ATGTCCCAGGTCCAGGGAACCGCTAACG A l tXtA T^ 

Gene : Tyros 

Segment! : 27 
Offset : 391 
1st Codon : 1 

HHAFVDSIFBQMLQRHRPLQBVYPBAHAPI 
CACCATGCCTTrGTGGATAGCATTTTCG^ 

Gene : Tyros 

Segment! : 28 
Offset > 406 
1st Codon : 1 

HRPLQBVYPBANAPIGHNRBSYMVPFIPLY 

CACAGACaX^CCAGGAA<;it»TATCtXX 

Gene : Tyros 

Segment! : 29 
Offset x 421 
1st Codon : 1 

GHHRBSYMVPFI PLYRNGDPPI S3 KDLGYD 
GGCCftTAACACAGAGTCCTAC A 'i W I ^ 

Gene : Tyros 

Segment! : 30 
Offset : 436 
1st Codon : 1 

RMGDFFISSKDLGYDYSYLQDSDPDSFQDY 
AGGAATGGCGATTrCTTTATCTCCACCAAAGA 

Gene : Tyros 

Segment! : 31 
Offset t 451 
1st Codon : 1 

YSYLQDSDPDSPQDYIKSYLBQASRIWSWL 
TACTCCTACCTCCACGATAGCGATCCOGATAgC^ 

Gene : Tyros 

Segment! : 32 
Offset : 466 
let Codon : 1 

IKSYLBQASRIHSWLLGAAMVGAVLTALLA 
ATCAAAACCTATCTGGAACAGGCTAGCAGAATCTGGAGCTG 

Gene : Tyros 
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Segment* : 33 
Offset : 481 
1st Codon : 1 

LGAAMVOAVLTALLA LVSLLCRHKRKQLP 

Gene : Tyros 

Segment* : 34 
Offset : 496 
1st Codon : 1 

GLVSLLCRHKRKQLPBBKQPLLMBKBDYHS 
QGOrrCCntmXXrrGCTCTGCACACACAAAA 

Gene : Tyros 

Segment* : 35 
Offset : 511 

1st Codon r 1 

BBXQPLLMBKBDYHSLYQSHLAA 
GAGGAAAAGCAACCCXTTCCTGATGGAGAAAGAGGATTACC^ 

Gene : TRF2 

Segment* t 1 
Offset : 1 
1st Codon i 1 

AAMSPLWWGFLLSCLGCKIZ.PGAQGQPFRV 

GCCCXrrATGTCCCCCCTCTQGTQGQG^ 

Gene ; TRP2 

Segment* t 2 
Offset : 16 
1st Codon i 1 

GCKI LPGAQGQFPRVCMTVDSLVNKBCCPR 
CGCPGTAAGATTCTGCCTGCCCCrCACCGA 

Gene : TRP2 

Segment* : 3 
Offset i 31 
1st Codon : 1 

CMTVDSLVHXBCCPRLGABSAHVCGSQQGR 

Gene : TRP2 

Segment* : 4 
Offset i 46 
1st Codon : 1 

LGABSAMVCGSQQGRGQCTBVRADTRPNSG 

Gene : TRP2 

Segment* : 5 
Offset : 61 
1st Codon : 1 

GQCTBVRADTRPWSGPYXLRNQDDRBLWPR 
GGCCAATGCACAGAGGTCAGGGCTGACACAAGGCCTTG 

Gene : TRP2 

Segment* : 6 
Offset : 76 
1st Codon : 1 

PYI LRNQDDRB L W PRKPFHRTCKCTGNFAG 
CCCTATATCCTCAGGAATCAGGATGACAGAGAGCTCTGGCCTA 

Gene : TRP2 

Segment* : 7 
Offset : 91 
1st Codon : 1 

KPPHRTCKCTGHFAGYNCGDCKPGWTGPNC 
AAGTTTTTCCATAGGACATGCAAATGCACAGGCA^ 

Gene : TRP2 

Segment* : 8 
Offset : 106 
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1st Codon : 1 

YNCGDCKPGWTGPWCBRKKPPVIRQNIHSL 
TACAATTGCGGAGACTCTAAGTTTGGCTGGACCGG^^ 



: TRP2 
Segment* : 9 
Offset : 121 
1st Codon : 1 

BRKKPPVIRQNI 
GAGAGAAAGAAACCCCCTGTGATTAGGCAAAACAT 



SLSPQEREQPLGALDLA 
AGAGCAATTCCTCGGCGCTCTGGATCTGGCT 



Gene : TRP2 

Segment* : 10 
Offset : 136 
1st Codon : 1 

S PQSRBQPLGALDLAKKR 



VHPDYV1 TTQHW 
VTTAOGTCATCACAACCCAACACTGG 



Gene : TRP2 

Segment* : 11 
Offset : 151 
let Codon t 1 

KKRVH PDYVI TTQ 
AAGAAAAGGGTCCACCCTGACTATGTGAT 



HMLGLLGPMGTQ PQPAN 
ITGGCACAOVGCCTCAGTTTGCCAAT 



Gene 
Segment! 
Offset 

1st Codon a 1 
L G L L G 



TRP2 
12 
166 



Gene : TRP2 

Segment* : 13 
Offset : 181 
1st Codon : 1 
CSVYDPPVWL 



H Y Y 



L H Y Y S V 
ATXACTATAGCGTC 



SVRDTIiLGPGRPYRAID 

!AGACACACTt3CTaXXXXTGGCAGACCCTATAGGG^ 



Gene : TRP2 

Segment* : 14 
Offset : 196 
1st Codon : 1 

RDTLLGPGRPYRAI D P SHQGPAPVTWHRYH 
AQGCATACCCItXTGGGACCCGGA*GGCCTTACAGAGCC^ 



: TRP2 
Segment* : 15 
Offset : 211 
1st Codon : 1 
PSHQGPAP 



VTHHRYHLLCLBRDLQRLIGNB 

ITCGCATAGGTATCACCTOCTGTGTCTGGAAAGGGATC^ 



Gene : TRP2 

Segment* : 16 
Offset : 226 
1st Codon : 1 

LLCLBRDLQRL.IGNBS PALP VSJS^W NPATGRKB 
CTGCTtrnXXTTCGAGAGAGACXTCCACAGA^ 

Gene : TRP2 

Segment* : 17 
Offset : 241 
1st Codon : 1 

S P A L P Y W M F A TGRNBCDVCTDQLPGAARPD 
ACCmtXtJL'raXXTATTGGAAlTl'tXCfACOGGA 

Gene : TRP2 

Segment* : 18 
Offset i 256 
1st Codon : 1 

CDVCTDQLPGAARPDDPTL ISRNSR P S S H B 
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TGCGATGTGTCrrACCGATCAGCTCrrCGGAGCCGCTAG 

Gene : TRP2 

Segment # : 19 
Offset : 271 
1st Codoo : 1 

D P T L I S RNSRFSSWBTVCDSLDDYMHLVTL 

GACXXT AOCCTCATCTCCAGGAATAGC^ 

Gene : TRP2 

Segment* t 20 
Offset : 286 
1st Codoo : 1 

T V C D SLDDYMHLVTLCMGTYBGLLRRMQMG 
ACCGTCTGCGATAGCCTOGACGATTACAATCACCK^^ 

Gene : TRP2 

Segnentff i 21 
Offset : 301 
1st Codoo s 1 

CNGTYBGLLRRNQMGRNSMKZ.PTLKDI RDC 
TtXAATGGCACATACGAAGGCCTCCTCaGAAGGA^ 

Gene : TRP2 

Segaenti i 22 
Offset : 316 
1st Codoo : 1 

RNSHKLPTLKDIRDCLSLQKPDMPPPPQH5 

aggaatagc^tgaagctccccacxctgaaagacat™ 

Gene : TRP2 

Segoent* : 23 
Offset : 331 
1st Codoo i 1 

LSLQKFDHPPPPQNSTFS FRNALBGPD KAD 
CTGTCCCTGCAAAACTTTGACAATCCCCCT mil K^GAATAGCACATTCTCCTTCAGAAACGC^^ 

.Gene : TRP2 

Segment! : 24 
Offset : 346 
1st Codoo : 1 

TP 6F R W A L B GPDKADGTLDSQVMSLHMLVH 

ACCTTTACCTTTAGGAATGCCCTCGAGQGATTCGATAAGGCTGA 

Gens : TRP2 

Segments : 25 
Offset : 361 
1st Codoo : 1 

GTLDSQVMSLHMLVHSPLNGTHALPHSAAM 
GGCACACTGGATAGCCAAGTGATCAGCCTCCACAATC^ 

Gene : TRP2 

Segment* : 26 
Offset : 376 
1st Codoo : 1 

S P L M G T M *_ L PHSAANDPI PVVLHSFTDAI P 
AGCTCTCTGAATGGCACAAACGCrCTGCCTCACTCCG 



L H S P T DAIPOBMMKRPNPPADAWP 

!TGCATAGCTTIACCGATGCCATTrrCGATGAG 

Gene : TRP2 

Segment f : 28 
Offset : 406 
1st Codoo : 1 

DBHMKRFHPPADAMPQBLAPIGHNRMYNMV 

GAGGAATGGATGAAGAGATTCAATCCCCCTGCCGATGC^ 



Cese : TRP2 

Segment* : 27 
Offset : 391 
1st Codoo : 1 

O P I P V V 
GACCCTATCTTTgTGGTC 
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Gene : TRP2 

Segment # : 29 
Offset : 421 
1st Codon : 1 

Q B h A P IGHNRMYNMVp F * F PPVTNBBLPLTS 
CAGGJUICTGGCTCCCATTQGCCATAACAGAATGTATAACRT^ 

Gene : TRP2 

Segment! t 30 
Offset : 436 
1st Codon : 1 

PPPPPVTNBBLFLTSDQLGYSYAIDLPVSV 

CLcmTnxvnx \ * ; 'it jA C^^ 

Gene : TRP2 

Segment* : 31 
Offset i 451 
1st Codon : 1 

* D Q L G Y 8 Y AIDLPVSVBBTPGKPTTLLVVMG 
GACCAACrGQGATACTCCTACCCTATCGATCT^ 

Gene : TRP2 

Segment* : 32 
Offset : 466 
1st Codon : 1 

BBTPGWPTTLLVVMGTLVALVGLFVLLAFL 



Gene : TRP2 

Segment* i 33 
Offset i 481 
1st Codon x 1 

TLVALVGLPVLLAFLQYRRLRKGYTPLMBT 

AcerrcGTGGCitrroGTOGGCxnvri ^ ^^ 

Gene : TRP2 

Segment* : 34 
Offset : 496 
1st Codon : 1 

QYRRLRKGYTPLMBTHLSSKRYTBBAAA 
CAGTATAGGAGACTGAGAAAGGGATACACACCCC^^ 

Gene : NC1R 

Segment* : 1 
Offset : 1 
1st Codon : 1 

AAMAVQGSQRRLLGSLNSTPTAIPQLGLAA 
G C O GC TATG GC TGT GC AAGGCTCCCAGAGAAGGCTCCTGQGAA 

Gene : MC1R 

Segment* : 2 
Offset : 16 

1st Codon : 1 

LVS TPTAZ PQLGLAAHQTGARCLBVSI S DG 
CTGAATAGCACACCCACAGOCATTCCCCAA^ 

Gene : MC1R 

Segment* x 3 
Offset : 31 
1st Codon : 1 
KQTGARCLBVSIS 



Gene : HOLR 

Segment* s 4 
Offset : 46 
1st Codon : 1 

LFLSLGLVSLVBHALVVATIAKHRMLHS P M 
CTGTTTCTGTCCCIGGGACTGGTCAGCCTCGTGGAAA 

Gene : MC1R 

Segment* : 5 
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Offset : 61 
let Codon : 1 

VVATIAKNRWLHSPMYCPICCLALSDLLV3 
GTGGTCGCCACAATCGCTAAGAATAGGAAT^^ It XTri A TlTG M f T GC CTCX^^ 

Gene : HC1R 

Segment! : 6 
Offset t 76 
1st Codon : 1 

YCPICCLALSDLLVSGTNVLBTAVI LLLBA 

TA Uivrrrra rriucntw 

Gene : MC1R 

Segment! : 7 
Offset : 91 
1st Codon : 1 

GTNVLBTAVI L L L 
OGCACAAACGTCCTGGAAAC C GCTGTGAT 

Gene t MC1R 

S e gm e n t! : 8 
Offset : 106 
1st Codon : 1 

GALVARAAVLQQLDNVIDVI T C S SMLS SLC 

Gene : MC1R 

Segment! : 9 
Offset : 121 
1st Codoo : 1 

VIDVITCSSMLSSItCFI/GAIAVDRYISIPY 
GTGATTGACGTCATCACATGCTCCAGCA 

Gene : NC1R 

Segment! : 10 
Offset t 136 
1st Codon : 1 

FLGAIAVDRYXSX FYALRYHS IVTLPRAP R 
TTCCTCGGCGCTATCGCTGTGGATAGGTATATCrCCA 

Gene : MC1R 

Segment! i 11 
Offset : 151 
1st Codon : 1 

ALRYHSIVTLPRAPRAVAAXWVASVVFSTL 
CCCCTCAGGTATCACrCCATCGTCACCCTCCC ^ 

Gene : MC1R 

Segment! t 12 
Offset : 166 
1st Codon i 1 

AVAAIWVASVVFSTLFIAYYDHVAVLLCLV 

GCCCTCGOCGCTATCTGGOTCGCrAGXXntXS TC 

Gene : HC1R 

Segment! : 13 
Offset i 181 
1st Codon : 1 

PIAYYDHVAVLLCI»VVPFLAMLVL»MAVLYV 
TTCATT GC C TATTA CGATCACG TC GCCGTCCT GC TCTG 



Gene : MC1R 

Segment! i 14 
Offset : 196 
lot Codon : 1 

V PPLAMLVLHAVI*YVHMLARA CQHAQG IAR 
G ^ I^ITITI 'CCTCGCCATGCTQGTCCTGATG G CCV 

Gene : MC1R 

Segment! : 15 

Offset : 211 

1st Codon : 1 
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HMLARACQHAQGIARLHKRQRPVHQGPGLK 
CAC^TGCTGGCTAGGGCTTGCCAACACKCrCAGG 

Gene .- MCXR 

Segment* : 16 
Offset : 226 
1st Codon : 1 

LHKRQRPVHQGPGLKGAVTLTI LLGI PFLC 
CTGCATAACAGACAGAGACCCGTCCAGXZAAG^ 

Gene i MC1R 

Segment* : 17 
Offset : 241 
1st Codon : 1 

G A V T L T I LLGI PFLCWGPFPLHLTLIVLCP 

OGCGCTGTGACACTGACAATCCTXCTGCGW ^ 

Gene : MC1R 

Segment* : 18 
Offset : 256 
1st Codon : 1 

NGPPPLHLTLIVLCPBHPTCGCIFKMFHLP 

Gene : MC1R 

Segaenti : 19 
Offset : 271 
1st Codon : 1 

B H P T C G C I P K H P N L P L A L I I C HAIIDPLIY 
CAGCATCCCACATGCGGATGCATTTTCAAAAACTTT 

Gene : MC1R 

Segment* t 20 
Offset : 286 

1st Codon : 1 

LALIICNAI IDPLI YAFHSQBLRRTLRBVL 
CTCCCTCTGATTATCrGTAACGCTATCATTC^^ 

Gene : HC1R 

Segment* i 21 
Offset : 301 
1st Codon : 1 

AFHSQBLRRTLKBVLTCSHAA 

GCCTrTCACTCXX3WGGAACTGAGAAGGACACTG^ 

Gene : HUC1P 

Segment* : 1 
Offset : 1 
1st Codon : 1 

AAMTPGTQS PFFLLLLLTVLTVVTGSGHAS 

Gene : MUCLF 

Segment* : 2 
Offset : 16 
1st Codon : 1 

LLTVLT VVTGSGHASSTPGGBKBTSATQRS 
CTGCTCACCGTCCTGACAGTGGTCACC^ 

Gene : NUC1P 

Segment* : 3 
Offset : 31 
1st Codon : 1 

STPGGBKBTSATQRSSVPSSTBKHAVSMTS 

Gene i MUC1P 

Segment* : 4 
Offset : 46 
1st Codon : 1 

SVPSSTBKHAVSMTSSVLSSHSPGSGSSTT 
AGCGTCCCCTCCAGCACAGAGAAAAACGCTGTGTCCATC 
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Gene : MUC1P 

Segment* : 5 
Offset : 61 
1st Codon : 1 

S V L S S H S PGSGSSTTQGQDVTLA P A T B P A S 
ACOGTCCTGTCCACCCATAGCC CT XC^ 

Gene : MUC1F 

Segment* : 6 
Offset : 76 
1st Codon : 1 

QGQDVTLAPATBPASGSAATWGQDVTSVPV 
CAGGGACAGGATGTGACACTGGCTCCCGCT 

Gene : MOC1F 

Segment # : 7 
Offset : 91 
1st Codon : 1 
G S A A T W 



KUC1P 



Segment* 
Offset 
1st Codon 



106 
1 

TRPALGSTTPPAHDVTSAPDNKAA 

kTGTGACAAGCGCTCCCGATAAGAAAGCCGCT 



Gene : MUC1R 

Segment* : 1 
Offset : 1 
1st Codon : 1 

AAHRPALGSTA PPVHNVTSASGSASGSAST 
GCCGCrAACAGACCC GC TCIXX3 GA AGCACAGCX^ 

Gene x MUC1R 

Segment! : 2 
Offset : 16 
1st Codon : 1 

NVTSASGSASG SASTLVHNGTSARATTT PA 
AACGTCACCTCCGCCTCCGGCI^^ 

Gene : M0C1R 

Segment* : 3 
Offset : 31 
1st Codon : 1 

LVHWGTSARATTTPASKSTPPS I PSUHSOT 
CTGGTCCACAATGGCACAAGCGCTAGGGCTACCA 

Gene : MOC1R 

Segment* : 4 
Offset : 46 
1st Codon : 1 

S K S T P P S 1 P 3 H HSDTPTTLASHSTKTDASS 
AGCAAAAGCACACCCTTTAGCATTCCCICCCACCX^ 

Gene : HUC1R 



Segment* 
Offset 
1st Codon 



5 

61 
1 



Gene : HUC1R 

Segment* : 6 
Offset : 76 
1st Codon : 1 

THHSSVPPLTSSNHSTSPQLSTGVSFPPLS 
ACCCATCAC 



Gene : MUC1R 
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ffset 
1st Cod on 



7 

91 
1 
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T 

ACCTCC 



P Q L 



TGVS FFFLSPH 



I SHLQFHSSLBDP 
^TTAGCAATCTGCAATTCAATAGCTCCXTCXvAAGACOCT 



: KOC1R 
Segment* : 8 
Offset t 106 
1st Codon : 1 

FHISMLQFNS SLEDPSTDYYQBLQRDI S B M 
TnXATATCTCOiACCTCCAGTTTAA^ 

Gene : HUC1R 

Segment* : 9 
Offset : 121 
1st Codon : 1 

STDYYQ8LQRDISBMFLQI YKQGGFLGLSN 
AGCACAGACTATTACCAAGAGCTCCAGXGAGAC^ 



: MUC1R 
Segment! : 10 
Offset : 136 
1st Codon : 1 

PLQXYXQGG 
TTCCTCCAGAT 



L G L S 



IKFRPGSVVVQ 



TLA 



: MUC1R 
Segment* x 11 
Offset : 151 
1st Codon t 1 

IKFRPGSVVVQLTLAPRB 
ATCAAAT 



GTINVHDVBTQP 
VTCAATGTGCATGACGTCGAGACACAGTTT 



Gene : HUC1R 

Segment* t 12 
Offset : 166 
1st Codon i 1 

PRBGTI HVHDVBTQPHQYKTB 
TTCAGAGAGGGAACCATTAACGTCCA^ 



A A 



R Y N L T I 
ATAACCTCACCATT 



Gene 
Segment* 
Offset 
1st Codon 
HOY 



: MUC1R 
13 
181 
1 

T B A A 



RYNLTXSD 
ATACAATCTGACAAT 



VSVSDVPFPFSAQ 



: KDC1R 
Segment* i 14 
Offset x 196 
1st Codon : 1 

8DVSVSDVPF PFSAQSGAGVPGWGIALLVL 
AGCGATGTGT0CG1GTCCGACGT CCX! CTTT CCC ^ 

Gene : MUC1R 

Segment* : 15 
Offset : 311 
1st Codon i 1 

SGAGVPGMGI ALLVLVCVLVALAIVYLIAL 
AQCGGACCCQGACTCC!CTGGCTGGGGCATTGCCCTCCT^ 

Gene : M0C1R 

Segment* : 16 
Offset : 226 
1st Codon : 1 

VCVLVALAXVYLIALAVCQCRRKNYGQLDI 
GTGTGTG TCC TOGTGGCTCTGGCT A TCCnxrT A CCTCC 



Gene 

Segment* 
Offset 



MOC1R 

17 

241 
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1st Codon : X 

AVCQCRRKNYGQLD I F PARDTYHPMSBYPT 
GCXXTTCTGCCAATGCAGAAGGAAAAACrATTX^^ 

Gene : MUC1R 

Segment • t 18 
Offset : 256 
let Codon : 1 

PPARDTYRPHSBYPTYHT 

W'CCCltX!CAGAGACACATACCATCCCATGAGCGAATACCCTACCTAT 



T D R 
VGACAGA 



HGRYVPPSS 



: MUC1R 
Segment # : 19 
Offset : 271 
1st Codon : 1 

YHTHGRYVPPSSTDR RPYBKVSAGNGGSSL 
TACCATACCCATGGCAGATTUXTIXXrCCCT^ 

Gene i MOC1R 

Segment* : 20 
Offset i 286 
1st Codon : 1 

SPY B JC V 3 AG NGGSSLSYTNPAVAAASAitLA 
AGCCCTTACGAAAAGGTCAGCGCTGGCAATGGCGGAAGCTCCOT 

Gene : MUC1R 

Segment* : 21 
Offset : 301 
1st Codon : 1 

SYTNPAVAAASANLAA 
AGCTATACCAAT 



Segments in scrambled order: 
gpioo »4 

WNRQLYPBWTBAQRLDCWR 

TGGAATAGGCAACTGTATCCCGAATGGACAGAGGCrCA^ 

TRP2 #6 

PY I LRHQDDRBLW PR 1C P PHRTCKCTGN PAG 
CCXriATATCCrCAa^ATCAGGATGAOWgA^ 

Tyros #30 

RNGDFPISSKDLGYDYSYLQDSDPDSPQDY 
AGGAATGGCGATTTCTTIATCTCCAGCAAAGACCTCGGCIAl^ 



TRP-1 #1 
A A P A 



LTWHRYHLLRLBKDMQBM 

KTACCATCTGCTCAGGCTCGAGAAAGACATGCAGGAAAT 



L Q B P S P S 



Tyros #29 

G H K R 
GGCCAT 

TRP2 #16 
h h C L 



BSYMVPPIP 



LYRNGDFFISSKDLGYD 
^TTAGCTCCAAGGATCTGGGATAOGAT 



BRDLQRItlGH 
SAGAGACCrCCAGAGACTGATTGGCAAT 



gplOO #23 

TTBVVGTTPGQAPTABPSGTTSVQVPTTBV 
ACCACAGAGGTOGTGGGAACCACACCOGGACAGGCTC^CACACCOG 



MUC1R #9 

S T D Y Y Q B J> Q RDISBMFLQIYKQGGF 
AGCACAGACTATTACCAAGAGCTCCAGAGAGACATTAGCGAAA 1UTT1XZTGCAAATCTAT 



L G L S N 



gplOO #36 

ACMBI SSPGCQPPAQRLCQPVLPSPACQLV 
GCCTGTATGGAAATCTCCAGCCCTG CC TGTCAGCCTCOCGCrCA^ 



TRP2 #31 
OQLGYSYAI 



D L P V 



VBBTPGWPTTLLVVMG 
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GACCAACTCXX^TACTarrACGCT^ 
TRP-1 #7 

TBDGPIRRNPAGNVARPMVQRLPBPQDVAQ 
ACCGAAGAOGGACCCATTACGAGAAACXXrnKXXXSAAA 

TRP2 #3 

CMTVDSLVNKBCCPRLGABSANVCGSQQGR 
TGCATGACCXnXJGACTCCCTGtntlAACAAAGACT^ 

HUC1R #13 

MQYKTBAA3RYMLTISDVSVSDVPPPP3AQ 
AACCAATACAAAACXX3U*GCCGCTACCA^ 

TRP2 II 

AAPSPLWWGFLLSC&GC1CILPGAQGQFPRV 
GCCGCTATGTCCCXXCTCTGGTGGGGCTTTCTGCTCA 

gplOO #18 

A D L 8 Y T W D P O DSSGTLXSRALVVTHTYLBP 
GCCGATCTtntXTACACATGGGATTTCGCAGA 

gpXOO #27 

LABHSTPBATGHT PAEVSIVVLSGTTAAQV 
CIGGCTGAGATGAGCACACCOGAAGCCACAGGCATtjACCC^ 

KUC1R ill 

I KFRPGSVVVQ LTLAPRBGTI NVHOVBTQF 
ATCAAATTOWGACCOGGAAGCGTCXntXSTCCAGCTCACC^ 

KUC1P #7 

GSAATMGQDVTSVPVTRPALGSTTPPAHDV 
GGCrCOGCOOCT A CCTGQQGCC^AAGAOGTCACCTCOGT G 

MC1R #16 

LHJCRQR PVHQG PG LKGAVTLT I LLG I P F I* C 
CTGCATAAGACAOWtSAGAOCOGTCCACCAAGGCrTTGGCCT 

MC1R #20 

L A L» I I C H A I I D PL X YAFHSQB I* R R T LKBVL 
CTtXXTCTGATTATCTGTAACGCTATCATTGACCCT 

TRP2 #7 

K F P H R T C K C T G W P AGYHCGDCKPGWTG PMC 
AAGTTTTTCCATAGGACATCXIAAATGCACAGX 

TRP2 #23 

LSLQKPDNP P F PQNSTPSPRHALEGPDKAD 
CIQTCCCTQCAAAAGTrTGACA ATCCOCCn B CrTrCA GA^ 

MOCIS #4 

S K S T P P SIPSHHSDTPTTLASHSTKTDASS 
AGCAAAAGCACACCCTTTAGCATTCCCTCCCACCATAGCGA 

MUC1R #1 

A A H R P A LCSTAPPVHNVTSAS GSASGSAST 
GCCGCTAACAGACCCGCTCTGGGAAGCACAGCTC 

TRP2 #21 

CNGTYBGLLRRHQHGRNSMKL PTLKD Z RDC 
TGCAATGGCACATACGAAGGCCTCCTGAGAAGGAATCAGATGGGCAG 

MUC1R #6 

THHSSVPPLTS SMB5TSPQLSTGVSPPFLS 
MC1R #13 

PIAYYDHVAVLLCLVVPFLAMLVLMAVLYV 
TTCATTGCCTATTACGATCAOGTCGCCGTCCTGCTCTGCCTOGTGG^ 

Tyros #16 

KLTGD8NPTIPYNDWRDABKCDICTDBYMG 
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AAGCTCACCGGAGACGAAAACTTTACCATTCCCTATTG 



gplOO #32 

L R L V K R Q 
CTGAGACTGGTCAAGAGACA 



V P L D C V 



X* Y R Y G S FSVTX#DIVQG I 
\TACGGAAGCTTTAGCGTCACCCTCGACATTGTGCAAGGCATT 



MUC1R #10 

PLQIYKQGGFLGLSHIKP 
TTCCTCCAGATTTACAAACAGGGAGGCTTTCTGGGA 



RPGSVVV.QLTLA 



HC1R #9 

VXDVITCSS 
GTGATTGACGTCATCACAT 



H L S S 



I» C P L G A IAVDRYI3 IFY 
VTTGCCGTCGACAGATACATTAGCATTTTCTAT 



Tyros #21 

RHPGNHDKSRTPRLPSSADVBPCLSLTQYB 
AGGAATCCCGGAAACCATGACAAAAGCAGAACCCCTAQGCTC 

TRP-1 #14 

FDBWLRRYHADISTPPLRWAPIOHNRQYNM 
TTCGATGAGTGGCTGAGAAGGTATAACGCTGACATTAC 

gplOO #39 

VSLADTMSLAVVSTQLIMPGQBAGLGQVPL 
^TCCXrnX^TGTGGTCAGCACACAGCTC^ 



gplOO #20 

GPVTAQVVLQAAI 
GGCCClxnXxACAGCCCAAGTGGTCCTGOUWGCCGCTAT 



P L T 



C G S S 



VPGTTDGH 



gplOO #13 

LGTHTMBVTVYHRRGSRSYVPLAHSSSAPT 
CTGGGAACCCATACC3KTGGAGGTCACCGTCT 



MC1R #12 

AVAAI MVASVVFSTLFIAYYDHVAV 
GCCCTCGCCGCTATCTCGGTGGCTAGCGTOGTGTTra 



TRP2 #25 

GTLDSQVMS&HHLVHSFLHG 
GGCACACTGQATAGCC^AGTGATGAGtCTCCACAAT^^ 



T M A L P 



L L C h V 



H S A A N 
iTAGCGCTGCCAAT 



MART #4 

GCWYCRRRHGYRALMDKSLHVGTQCALTRR 
ATTGCAGAAGGAGAAACGGATAOWGAGCCCTCA^ 



Tyros #15 
P W H R L 

CCCTGGCACAGA 



FLLRWBQBIQKLTGDBNFTI PYWDW 
5AGATTCAGAAACTGACAGGCGATGAGAATTTCAGAATC 



MC1R #1 

AAMAVQGSQRRLLGSLNSTPTAI PQLGLAA 
GCCXXTATWOTGTGCAAGGCTCCCAGAGAW 

MC1R #5 

VVATIAKNRHLHS P M Y C F I CCLALSDLLVS 
GTCCTCGCCACAATCGCTAAGAATAGGAATCTGCA^ 

Tyros #25 

QSSMHHALH IYHNGTMSQVQGSANDPI F L L 
GAGTCCAGCATGCAGAATGCCCTCCACATTTACATGAACG 



MC1R #6 

YCFICCLALSDLLVSGTNVLETAVILLLBA 
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TACTGTTTCATTTGCTGTCTGGCTCT 
TRP2 #19 

D P T L I S RNSRFSSWBTVCDSLDDYNHLVTL 
GACCCTACCCTCATCTCCAGGAATAGCAGATK^^ 

MUC1F #8 

T R P A L CSTTPPAHDVTSAP0MKAA 
ACCAGACCCGCTCTCGGAAGCACAACCCCTCC^^ 

Tyxou #17 

RDABKCDICTDBYMOOQHPTHPHLLSPASF 
AGCGATtXTCGAAAAGTGTGACATrTOCACAGACGAATACAT^ 

gplOO «17 

TF A L Q h H DPSGY LABADLSYTWDFGDS6GT 

ACCTTTt^CCTCCAGCTOCACGATXXCTCCGGCT»» Z CTGGCTGAQGCTGACCTCAGCTATACCTGGGA L 'l l' .ltA^^ TAGCrCCGGCACA 

Tyroo #22 

S S A D V B P C L S L TQYBSGSMDKAAHFSFRHT 
AGCKXXiCCGATGTikiAATTCTC 

gplOO #6 

GPTLIGANASFSIALNFPGSQKVLPDGQVI 
GGCCCTACCCTCATOGGAGCCAATGCCTCCTTCTCCATOGCTC 

MC1R #18 
M G P F F L H L T L I 

TCGGGAGCCTTTrTCCTCCACCTGAGCCTCAT 

Tyros #7 

CQCSGHFMGFNC GNCKFOFWGPNCTBRRLL 
TGCCAATGCTCCGGCAATTTCATGGGCTTTAACT^ 

TRP2 #34 

QYRRLRKGYTPX.MBTHLSSJCRYTB8AAA 
CAGTATAGGAGACTGAGAAAGGGATACACACCCCTC^ 

TRP-1 #15 

PL B M A P IGHNRQYNM 

OOCCTCGAGAAltSCCCCTATOGGACACAATAGGCAATACAAT 

NGS QVWGG 

5TGTGGGGCGGA 

gplOO #22 

RPTABAPNTTAGQVPTTBVVGTTPGQA P T A 
AGGCCTACOGCTGAGGCTCCCAATACCACAGCCGGA 

HOC1F #3 

S T P G GBKBTSATQRSSVPSSTBKHAVSMTS 
AGCACACCCGGAGGCGAAAAGGAAACCTCCGCCACACa^ 

9P100 #42 

LIYRRRLMKQ D P S VPQLPHSSSHMLRLPRI 
CIGATTraCACAACGAgACTGATGAAGCAACACTra 

TRP2 #12 

t» G J* T Q P Q F A M C SVYDFFVHLHYYSV 

rTCGCTAACTGTACCGTCTAOG AT i ILU ItSTCTGG CTCCATTACTATAGCGTC 




gpioo #1 

AAHDLVLKRCLLHLAVIGALLAVGATKVPR 
GCCGCTATGGATCTGGTCCTCAAAAGGTGTCT GCT 

MC1R #3 

HQTGARCLBVS ISOGLFLSLGLVS L V B H A L 



Figure 27 (Cont) 



WO 017090197 



PCT/AU01/00622 



173/216 

AACCAAACCGGAGCCAGATGCCTCGAGGTCAGCATrAG 
Tyros #23 

SGSMDKAANFSFRNTLBGFASPLTGIADAS 
AGCGGAAGCATCGACAAAGCOGCTAACTTTAGCTrTAGGA^ 

Tyroa #4 

SPCGQLSGRGSCQNILLSWA PLGPQFP FTG 

Tyros #13 

MHYYVSKDALLGGSBIWRDI DFAHBAPAFL 

ATGCATTACTATGTGTCCATGGATGCCCTCCTCGGAGGCTCC^ 

Tyros #35 

BBKQPLLMB KBDYH SLYQSH L A A 
GAGGAAAAGCAACCCCTCCTGATGGAGAAAGAGGATTACCATAGC 

TRP2 #5 

CQCTBVRADTRPWSGPYILRNQDDRBLWPR 
GGCCAATGCACAGAGGTCAGGGCTGACACAAGGCCTTGG 

MUC1F #4 

SVPSSTBKNAVSHTSSVLSSHSPGSGSSTT 
AGOGTCrCCTCCAGCACACWGAAAAACGClXnxn^ 

Tyros #12 
TPHPNDINIY 

ACOCCTATOTTTAAOGATATCAATATCTAT 

gpioo #9 

QPVYPQBTDDACIFPDGGPC PSGSNSQKRS 
TRP-1 #6 

DSLRDYDTLGTLCHSTBDGP I R RNPAGNVA 
GACTCCCTGGAAGACTATGACACACTCGGAAOCCTC^ 

gpioo #0 

W V H H T I I K G S Q V W C G Q P V Y PQBTDDACIFP 
TGGGTCAACAATACCATTATCAAltJGCTCXXIAGGTCTG 

MART #7 

QBKNCBPVVPNAFFAYBKLSABQS PPPYSP 
CACGAAAAGAATTCCGAACCCGTCGTCCCTAA^ 

gplOO #14 

SRSYVPLAHSSSAPTITDQV PFSVSVSQLR 
AGCAQAAGCTATGTQCCTCIGGCTCACTCCAGCTCC^^ 

TRP-1 #2 

LBKDMQBMLQBPSFSLPYNNFATGKNVCDI 
CTGGAAAAGGATATGCAACAGATGCTGCAACAGCCT^^ 

TRP-1 #16 

VPPWPPVTHTBMPVTA PDHLGYTYBAA 
GTGCCTTTCTGGCCCCCTGTGACAAACAC3VGAG^ 

TRP2 #13 

C S V Y D F P V M L H YYSVRDTLLGPGRPYRAID 
TGCTTCGTGTATGAC i 1 1 1 j CGTCT 

Tyros #9 

VRRNIFDLSAPBKDKFFAYLTLAKHTISSD 
GTGAGAAGGAATATCTTTGACCTGAGCGCTCCCGAAAAGG^ 

HART #2 

KKGHGHSYTTABBAAG IGZLTVI LGVLLLI 
AAGAAAGGCCATGGCCATAGCTATACCACAGCCX3AAGAGGCTG 

gplOO #11 

PVYVHK THGQYWQVLG GPVSGLS IGTGRAH 
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TTCGTCTAOGTCTGGAAAACCTGGGGCCAATACTGGCAG 
gpioo #12 

COPVSOLSIGTGRAMLOTHTMBVTVYHRRG 
GGCGGACCCGTCAGCGGACTGTCCATCGGAA^ 

gpioo #25 

ISTA PVQMPTABSTGMT PBXVPV S BVMGTT 

ATCTCCACCGCTCCCGTCCAGATCCCCAC^^ 

Tyroa #19 

P S S WQIVCSRLRBYNSHQSLCNQTPBGPLR 
1TCTCCAGCTGGCAGATTGTGTGTAGCAGACTGGAAGAGT 

TRP2 #27 

D P 1 P V V L H S F T D A IF D BWMKRFNPPADAWP 

GACCCTATCTTTGTGGTCCTGCATAGCTTTAO XATGCCATTTTCGATGAGTGGATGAAAAGGTTT^^ 

HC1R #15 

HMLARACQHAQGIARLHKRQR PVHQGFGLX 
CACATGCTGGCTAGGGCTTGCCAACACGCTC^ 

MUC1F #2 

L L T V . J*_ T V v TGSGHASSTPGGBKBTSATQRS 
CTGCTCACCGTCCTOACACTGGTCACXGGAAGCGGA 

gplOO #44 

F C S C P I G R N S PLLSGQQVAA 
TTCTGTAGCTGTCCGATTGGCGAAAACTCCC^ 

TRP2 #24 

TPSPRNALBGPDKADGTLDS QVMSLHNLVH 
ACCTTTAGCTTTAGGAATCCCCTCGA^ 

Tyros #20 

SHQSLCHGTPBGPLRRNPGNHDKSRTPRLP 
AGOCATCAGTCCCTGTGTAAGCX5AACCXXTGAGGGACCCCTC 

TRP2 #30 

PPPPPVTHRBLPLTSDQLGYSYA IDLPVSV 

cixmmx xrrcccGT^^ 

TRP2 #9 

BRKKP PVI RQNIHSLSPQRRBQP LGALDLA 
TRP2 #29 

0 B LAP! GHHRMYWMVPPFPPVTNRB LFLTS 
CAGGAACTGGCTCCCATTGGCCATAACAGAATGTATAAC^ 

gplOO #29 

BVSIVVLSGTTAAQVTTTBWVBTTARBLPI 
GAGGTCAGCATrGTGGTCCTGTCtXSGCACAACOGCTGCCCA 

MUC1R #7 

TSPQLSTGVSPPPLSPH I3NLQPHSSLBDP 
ACCTCCCCCCAACTCrrCCACXaX ^ 

MOC1R #19 

YHTHG RYVPPSSTDRSPYBKVSAGHGGSSlt 
TACCATACCCATGGCAGATAOCTXXXXXXTAGCTCCACCGA 

MC1R #4 

LPLSLGLVSLVBNALVVATIAKNRHLHS PM 
TRP2 #26 

8FLNGTNALPH5AAHDPIPVVLHSPTDAIF 
AGCTTTCTGAATGGCACAAACGCTCTGCCTCAC^^ 

MUC1R #17 

AVCQCRRKNYGQLDIPPARDTYHPMSBYPT 
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GCCGTCTCCCAATGCAGAAGGAAAAACTAT^ 
MC1R #14 

VFFLAMLVLMAVLYVHMLARACQHAQGIAR 
GTGTTTTKXTOCXX^TGCTGGTCCTC^TG^ 

TRP-1 §10 

STNSFRMTVBGYSDPTGKYDPAVRSLHNLA 
ACCACAAAlTCirrfCAGAAACACACTGGAA^ 

TRP-1 S3 

LPYMMFATGKNVCDICTDDLMGSRSNPD3T 
CTTXXTTACTQGAACTTItXXJlCACGCAAA^ 

gplOO ilS 

ITDQVP PSVSVSQLRALDGGNXH PLRNQPL 
ATCACACACCAAGTGCCTTIClCOCnCTtXXnXglCC^ 

KUC1R #8* 

F H -_ M -- L _ Q p H SSLBDPSTDYY Q B1 *QRDISBM 
TTOCATATCTCCAACCTCCAGTTTAACTCCAGCCTCGAGGA^ 

HUC1R #20 

SPYBKVSAGNGGSSLSYTNPA 
AGCCCTTAOGAAAAGGTGAGCGCTGGCAAT 

Tyroa #11 

Y V I PI O TY GQMKMGSTPMFMDINIYDLFVM 

TAOGTCATCCCTXTCQGAAOCTATQCOCAAATGAAAAACGGAAGCACA 

gplOO #37 

RLCQPVItPSFACQLVLHQXLKGGSGTYCLN 
AGGCTCTGCCAACCCGTCCTGCCrAGCCCTGC^^ 

gplOO #33 

R Y ° fi F _?— Y TI » DIV Q GIBS ABII.QAVPSOBGD 
AGGTATGGCTCCTTCTOOGTOACACrGGATATOGTCCAGGGAATOGAA 

Tyros #27 

HHA PVDS IFBQMLQRHRPLQBVY P B A N A P I 
CAOCATGCCTTTGTGGATAGCATTTTCGAACAGTGGCTGCAAA 

TRP-1 #4 

CTDDLMGSRSHPDSTLXSPNSVPSQWRVVC 
KUC1R #16 

P PA RDTYHPMS BY PTYHTH GRYV 
TTCCCTGCCAGAGACACATACCATCCCATGJVG^ 

MUC1R #21 

SYTMPAVAAASAHLAA 

AGCrATACCAATCCCGCTGTGGCTGCCGCT?VGOGCTAACCTOGOCGCT 

NC1R #19 

BBPTCGCIPKHPNLFLALIXCHAI I D P L I Y 
GAGCATCCCACATGOGGATGCATTTTCAAAAACITTAACCT^ 

Tyros #26 

HSQVQGSANDPI FLLHHAFVDS I FBQNLQR 
ATGTCCCAGGTCCACXXSAAGOGCTAACGATCCCATrTTCCrCCrGC^ 

TRP2 #22 

RHSMKLPTLKDIRDCLSLQKPDHPPFFQNS 
AGGAATAGCATGAAGCTCCCCACACTGAAAGACATTAGGGATTC 

gplOO #19 

LISRALVVTHTYLBPGPVTAQVVLQAAI PL 
CTGATIAGCAGAGCCCTCGTQGTCACCCATA^ 

TRP2 #17 

SFALPYtfHFATGRHBCDVCTDQLFGAARPD 
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TATTQGAATTTCGCTACCGGAAGGAATGAG^ 
gplOO #2 

V 1 G A L _ L A V 0 ATKVPRHQDMLOVSRQLRTKA 
CTGATTOXOCTCTGCraKXCTCGGCCCT 

gplOO #16 

ALD O O N X H PL RHQPLTFALQLHD P S G Y L A B 
GCCCTCGACGGAQGCAATAACCATTTCCTCAQGAATCAGCCT 

TRP2 #18 

CDVCTDQliFGAARPDDPTLI SRNSRPSSWE 
TOCGATOTGTGTACCt^TOWGCTCTTCa 

MART #1 

AA MPRBDAHPIYGYPKKGHGHSYTTABBAA 

GOOGCfAl'OCCl i A<XjGAA < SA CU CIXJUrTTTATC^ vC.lV- wGCTGAGGAAGCCGCT 

TRP-1 #11 

T G X Y D P A V R 9 L HNLAHLFLHGTGGQTHLSS 

ACCGGAAAGTATGACCCTCOCXnX3bCsGTCCCTGCA 

MUC1R #14 

SDVSVSDVPFPFSAQSGAGVPGHGIALLVL 
AOCCATGTCTCCGTGTCa^C maX ^ 

TRP2 #10 

S P Q B R 8 Q F LGALDLAKKRVHPDYVITTQHW 

AGCXCTCAGGAAAGGGAACAGTTTCTGGGAGCXXITCGA 

Tyros #10 

FFA Y L TLAKHTX SSDYVI PIGTYGQHKMGS 

TTCTTTGCCTATCTGACACTGGCTAAGCATACCATTAGCT^ 

MC1R #7 

GTNVLBTAVI LLLBAGALVARAAVLQQLDN 
GGCACAAACGT'OCTGGAAACC G CTGTGATTCTGCTCC^ 

KUC1R #16 

VCVLVALAIVYLIALAVCQCRRXNYGQLDI 
GTCTGTGTGCTOtnOGCTCrG GC TATOGTCTACCT 

MART #6 

C P Q B G F DHRDSXVSLQBKRCBPVVPMAPPA 
TCCCCTCACGAACGCTTTCACCATaGOGATAGCAAA^ 



TRW #28 

DBMMKRFNPPADAWPQBLAPIGHNRMYMMV 

QACGAATGGATGAAGAGATTCAATCCCCCTGCCGATG^^ 

MC1R #21 

AFHSQBLRRTLKBVLTCSWAA 
GCCTTTCACTCCCAGGAACTGAGAAGGM3WrrG^ 



TRP-1 #8 

RPMVQRLPBPQDVAQCLBVGLPDTP PFYSM 
ACGCCTATGGTCOU5AGACTGCCrGA 

TRP-1 #13 

Q D P I F V L LHTPTDAVPDBHLRRYNADI STP 

CAQGATCCCA'lTmX'lVlTGCl^ 

TRP2 #4 

LGAB8ANVCGSQQGRGQCTBVRADTRPMSG 
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CTGGGAGCCGAAAGCGCTAACGTCTGCGGAAGCCAACAG 
TRP2 #8 

YNCGDCKPGWTGPNCBRK1CPPVIRQNIHSL 
TAeAATTGCOGAGACTCTAAGT T T GGC TQ O AC^^ 

TRP-1 #12 

HLFLNCTGGQTHLSSQDPIFVLLHTPTDAV 

Tyros #34 

GLVSLLCRHKR KQLPBBKQPLLMEJCEDYHS 
TRP2 #2 

GCK I LPGAQGQ PPRVCMTVDS LVNK B C C P R 
GGCTCTAAGATTCIGCCTGGCGCTCAG^ 

gplOO #43 

QLPHSSSHWLRLPRIFCSCPIGBNS P L L S G 
CAGCIXXCXXATAGCTCCAGCCATTGGCTCAGGCTCCCC^ 

gplOO #10 

DO G PCPSG SHS QKRSPVYVWKTNGQ YWQVL 

GACGCACGCCCITCCCCTAGCGGAAGCTOGAGCCAAAAGA^ 

gplOO #3 

HQDWLGVSRQLRTJCAWMRQLYPBHTBAQRL 
AACCAAGACTGGCTGGGAGTGTCCAGGCAACItaAGAACCAA 

Tyros #14 

IWRDIDPAHBA PAPLPWHRLPLLRW B Q B I Q 
MUC1P #1 

A A M T PGTQS PP PI»LI»LLTVLTVVTG SGHAS 
MART #5 

DKS LHVGTQCALTRRCPQBGPDHRDSKVSL 
GACAAAAGCCTCCACGTOG GC ACACAGTGTGCCC^^ 

MUC1R #2 

MVTSASGSASGSASTLVHHGTSARATTTPA 
AACGTCACCTCCGCCTCCOGCTCCGCCT^^ 

Tyros #24 

LBGFASPI* T G I A D A S Q S S M H HALHZYMNGT 

CTGGAAGGCTTTGCCTCCCCCCTCAjC^ 

TRP2 #14 

rdtllgpgrpyraidfshogpapvtwhryh 

agqcataccctcctgqgacccqgaaggccttacacacccatt^ 



gplOO #35 

afbltvscqgglpxbacmbisspgcqppaq 

gcctttgagctcaoogtcagctgtcagggaggcct^ 

Tyros #6 

vddrbsmpsvpymrtcqcsgmpmgpncqhc 
gtggatgacagagagtcctggcctagogtcttctataa 

gplOO #34 

BSABILQAVPSGBGDAPBLTVSCQGGLPKB 

GAGTCCGCOGAAATCCTCCAGGCTGTOCCTAGOGGAGAGGG 

TRP2 #20 

tvcdslddynhlvtlcngtybgllrrhqmg 
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ACCGTCTGCGATAGCCTCGACC^ 
Tyroa #5 

LLSNAPLGPQP PFTGVDDR8SHPSVFYNRT 

CTGCTCAGCAATCXXXXrrCTGGGACCCC^ 




MART #3 

GICILTVILGVLLLIGCWYCRRRNGYRALM 
GGCATTCXX^TTCTGACAGTGATTCTCOGA^ 

Tyros #31 

YSYLQDSDPDSFQDYIJCSYLBQASRIWSHL 
TACTCCTACCTCCAOGATAOCGATCCOGATAC CTTT^ 

MUC1F #6 

QGQDVTLAPATBPASGSAATWGQDVTSVPV 
CAGGGACA<X5ATGTGAOtfrrGGCTCCCGCTAC^ 

gplOO #21 

TSCGSSPVPGTTDGHR PTABAP*NTTAGQVP 



MUC1R #3 

JLVHHGTSARATTTPAS JC S T P PS I PSHHSDT 
TRP2 #32 

SBTPGWPTTLI«VVMGTLVALVGIiFVX#I*APL 
GAGGAAACCCCTCGCTGGCCCACAACCCTCXrr^^ 

gplOO #29 

TTTBWVBTTARBLPI PEPBGPDASSIMSTB 
ACCACAACCGAATGGGT<X3AGACAACCGCTAGGGAACTG 

HC1R #17 

GAVTLTILLGI PFLCWGPFPI#HLTLIVI#CP 
GGOGCTCTGACACTGACA A TOCTOCTCGGAAl'CT !WTTOCTCT GC T G GGG CX!CTTTCT VI C TGCATCTGACACTGATTGT GC lXritXCCT 

SLLCRHKRKQLP 
ATAAGAGAAAGCAACTGCCT 

HC1R #8 

GALVARAAVLQQLDHVZDVITCSSMLS5LC 
gplOO #26 

MTPBKVPVSBVMGTTLABMSTPBATGMTPA 
ATGACACCCGAAAAGGTCCCCGTCAGCGAAGTGATCG^ 

Tyxos #2 

QTSAGHPPRACVSSXWLH 
CAGACAAGCGCTCGCCATTTCCCTAGGGCTTGC^^ 

HC1R #11 

ALRYHSZVTLPRAPRAVAAZHVA5VVPSTL 

GCtX"rCAG<jlATCACTCX!ATOGTCACC^ 

HUC1R #12 

PRBGTI MVHDVBTQ PHQYKTBAAS R Y N L T I 
TTCAGAGAGGGAACCXTTAACGTCCACGATGTGG 

Tyros #3 

NLHBRBCCPPWSGORSPCGQLSGRGSCQNX 
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AACCTCXTGGAAAAGGAATGCT 
Tyros #32 

IKSYLBQASRIWSWLLGAAMVGAVLTALLA 
ATCAAAAGCTATCTGGAACAGGCTAGCAGAATCTGGAGCTGGCT^ 

MUC1R #5 

PTTLASHSTKTDASSTHHSSVPPLTSSNHS 



HUC1R #15 

SGAGVPGMG IALLVLVCVLVALA I V Y L I A L 
MC1R #10 

FLGAIAVDRY ISIFYALRYHSIVTLPRAPR 
TTCCTCGGOGCTATCGCTGTGGATAGGTAT^^ 

gplOO #40 

L I M P G Q K A Q L Q QVPLIVGI LLVLNAVVLAS 
CTGATTATGCCTGGCCAAGAGGCTGGCCTCGGCCA 

TRP2 #33 

TLVALVGLFVLLAPLQYRRLRKGYTPLHBT 
ACCCTOOTSGCTCTtySTOGGCCTCTTCCTCCTCCTCG CCTTTCT 

TRP-1 IS 

LIS PNSVFSQWRVVCDSLEDYDTLGTLCNS 
CIGATTAGCCCTAACTCCGTGTTTAGCCAATGGAGAGTXXn^ 

MC1R #2 

LNSTPTAI PQLGLAAIfQTGARCLBVSISDG 
CTGAATAGCACACCCACAGCCATTCCCXZAACTGGGAC^ 

Tyros #28 

HRPLQBVYPBAHAPIGHHRBSYMVPPIPLY 
CACAGACCCCTCCAGGAAGTGTATtXXDGAA^ 

gplOO #24 
B P S G T T 

TRP2 #11 

KKRVH PDYVI TTQHMLGLLG PNGTQ PQFAN 
AACAAAACGGTCCACOCTGA C T AlV r GA TTACCA 

gplOO #38 

LHQILKGGSGTYCLMVSLADTMSLAVVSTQ 
CTGCATCAGATTCTGAAAGGCGGAAGCGGAACCTA1UGCC1 

gplOO #30 

PBPBGPDASSIMSTBSITGSLGPLLDGTAT 
CCCX3»ACCCGAAGGCCCTGACGCTAG^ 

gplOO #31 

SXTGSLGPLLDGTATLRLVKRQVPLDCVLY 
AGCATTACCGGAAGCCTCGGCCCTCTGCTXXSACG 

gpioo #5 

D CM RGGQVSLKVSNDGPTLIGANASPSIAL 
GACICTTGGAGACSGOGGACAGGTCAtSCCTC^ 

Synthetic Protein: 

WHRQLYPBKTBAG^UJDCIfRGGOVSIJCVSiroPYILR SSKDLGYDYSYLQDSDPDSPQD YAAPAFLTN 

HRYHLLRLBiaJMQBMLC^PSFSGHNRBSYMWFlPLYRMGDPPI 

FSGTTSVQV PTTBVSTDYYQBLQRDI SBMFLQI YKQGGFLGLSMACHBISS PGCQPPAQRIjCQPVLPSPACQLVDQLGYSYAIDIJ^SVBBTPGWPTT 

LL WMGTEDG P I RRNP AGNVAR PMVQRLPB PQDVAQCXTVDS L VNKBCCPRLGAES ANVCG SQQGRHQ YKTKAAS R YNLTI SDV SVSDVP P P PSAQAA 

MS PLHWGFLLSCUXTCLPGAQGQFPRVADI^^ 

QLT1JVFRBGTINVHDVBTQFGSAATWGC5)VTSVPVT1^ 

SQRUUmJOZVlJCFITIRTOCCTGNFA 

DA8SAANRPALGSTAPPVHNVTSASGSASGSASTCNGTYBG1XRR 
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YDHVA VIJ^LVVPFLAMLVLMA VL YVKLTGD ENFT I PYWDWRDAKKCD I CTDEYMGUU^VKRQVPLDCVLYRYGSFSVTIJDIVQG I PbQIYKQGGPLG 

LSNIKPRPGSVVVQLTIAVIDVITCSSMI^SLCFIX^IAVDRYISIFYRWP 

PIQiNRQYNMVSLADTNSIJVWSTQLIMPGQB^ 

DWX?THTHBVTVYHRKGSRSYVPLAHSSSAPTAVAAI^^ 

RJUICTIUUJIDKSLIWGTQCALTl^^ PYWDW AAMA VQGSQRRLLGS LNST PTAI PQLGIAA WAT IAKNRNLHS P 

MYCPI CCI*ALSDIXVSQSSMHHAIJ1I YMHGTWS(^QGSAN^ 

AVIIJARADPTLISIWSRFSSMBTVCDSUJD^ 

HPPSGYLaBADLSYIWFC 

PBHPTCGCIFXNPNLPCQCSGVFinFHOCaiCXFGFWG 

TBMFVTN7POSQlCVLPDGQ\riWVllNTIINGSQVWGGW > TA 

YW*RI>OCQPPSVPQIJ>HSSSHWIJU^^ 

IXHIAVIGAIAAVGATICVPRilQ^ 

MAPIX3PQPPPTCMirnrVS>ffiAlXGGSBIWW>IDPAHE^ 

SSTBKWAVSKTSSVLSSHSPGSGSSTTTPMFNDINIYDLPVWMHYY^ 

LCNSTEIXSPIRRHPAGNVAWNHTIIH^ 

VPPSVSVSQIJtLKKDMQEHl^BPSF^PYWPATnK 

VRRNI FDLSAPBKDKPFAYLTLAKHTI SSDKJGGHGHS YTTABBAAGIGI LTVIIX^LLUFVYVWlCTIIGQYWQVLGGPVSGIiS IOTGRAMGG PVSGt*3 

IGTGIWMICTm«VTVYHIUWISTAPVQM^ 

BWKKRFNPPADAWPHMIARACQHAQGIARLHKRQR 

RSALEGPt)KADGTli>SQVMSIiINLVHSHQSLaiGTPTC 

HSLSPQBRBQFWSAI^LAQBIAPIGHNRMYNMVPFFPPVTOBKLFLT^ 

LQFNSSLBDPYHTHGRYVPPSSTDRSPYBKV^^ 

IFAVCQCJUUOnfGQU>IFPAW)TYHP^ 

ATGXHVCDICTDDIJtfSSRStfFDSTITDQW 

LSYTNPAVAAASANIAYVIPIGTYGQMJWGSTPMF^ 

AVPSGEGDHHAFVDSIPBQWIXJRHRPLQBVYPE^ 

SrroPAVAAASANLAABHFTOG£XFlQro 

HPPPFONSLI SRALVVTHTYLBPGPVTAQfVVLQAAI PLSFAI*PYI«n'ATGRNECDVCTDQLFGAAIlPDVI GAUjAVGATKVPRNQDWIXTVSRQIaRTKA 

ALDGGNXHFIJWQPLTFA1£LHD PSGYIABCDVCTDQLFGAAR FDD PTLI SRN SRFSSlfKAAM PRKDAHFIYGYPKKGHGHSYTTABKAATG JCYD P AV 

RSUanAHLF^XHTCgrHLSSSDV^^ 

VIPIGTYGQMlOreSGTNVLKTAVIIXLKAGAiVA^ 

PNAPPASVI^SHSPGSGSSTrQGQDVTIAPATB^ 

HRYHlXCLBRDLQRLICaiBRPMVQRLPBPQDV^ 

BVWUXraPttSGYHCGDCKFGOTGPN^^ 

YHSGCiaWMQ(^FPRVCMrVDSLVHKBCCt^QLPHS8 

GVS RQLRTXAWNRQLYPBWTBAQRI*I WRD I DFAHKAP AFLPWHRLFLLRWBQ E I QAAMT PGTQS PFFLLIXIjTVXiTVVTGSGHAS D KS LHVGTQCAI/T 

RJK^EGFIJKRDSKVSItfVTSASGSASGSAST^^ 

AFVTWHRYHAAMUAVLYCLLWSPQTSAGHFFR 

CESABI U^VPSGBGDAFBLTVSCQGGLPlCBTVCDSIJro^ 

QSPPPYSPAAIVGILLVUIAWLASI^YRIUUJ^^ 

WLOGQO VTLAPATB P ASGSAATWGQDVTSVPVTSCGSS PVPGTITXSHRPTAKAPNTTAGQVPLtVHNGTSARATTTPAS KST PFS I PSHHSDTBBTPGW 

PTTIAWMGTLVALVGLFVIIAFLTITB^^ PBPBGPDASSIMSTEGAVTLTILLGI FFLCWGPFPLHLTLI VLCPLGAAMVOAVLTALL 

AGLVSIXaUUOUCQIJGALVAI^^ 

CPPHSGDRAIAYBSIVTLPRAPJttVAAIWVA^ 

IKSYLEQASRIWSIfLI/jAAMVGAVLTA 

gYTSIFTALRYHSrVTLPRAPRLIMPCQ 

SLKDYDTLGTIOISI^STPTAIPQ^^ 

TABSTGJOCRVHPDWITTQHXLGI>I^PWrrQPQF 

TGSLGPLU)CTATUU,VKRGfWU)an^ 

Synthetic DMA: 



TGGAATAGGCAACTGTATCCCGAATGGACAGAGGCTCAGACA 

CCTCAGGAATCAGGATGACAGftCACCTCTGGCCTAGGAA ATTC^ 

TTATCTCCAGCAAAGACCTOGGCTATGACTATAGCT^ 

CACAGATACCATCTGCrCACGCTCGACAAAGACATC ff j 
CAT TOXCT CT ACAGAAAaSGA GA LTrr^ ^ 

CCCTCCGGCACAACCTCCGTGCAACICCCT^ 
CTATAAGCAAGGCtX^TTCCTCGGCCTCACCAATCCCT 

ATACTCCtAOGCTATOGATCTGCCrG'rGTCCGTtWAA^ 
5pACCCATTAGGAGAAACCCTGCCCGAAAC^^ 
CQTCCCCCAATGCATGACCGTCGACTCCCre^^ 
GAAACCAATACAAAACCGAAGCOGCrAGCAGATACAATCTGACAATC^ 
ATGTCCCCCCTCTGGTGGGGCTTTCTGCTCAGCTC 
CACATGGGATTTOGG AQAC TCCAGOGGAACCCTCATCT 
AAGCCACACGCATGACCCCTGCCCAACTGTCCATCgrC^ 
CAGCTCACCCTOGCCTTTAGGGAAGGCACAATCAATC^^ 
GCCTGTGACAAGGCCTGCCCTCGGCTCCACCACACC^^ 

GAGCCGTCACCCTCACCATTCTGCTOOGC A ' J II 1 CTl'l ITGTCTCIXXXTItriraTTAT^^ 

AGCCAAGAGCTOUSGAGAACCCTOWSGAAGTGCTCAA G 1*1 A" n \XA TAGGACATGCAAATGCACAGGCAATTTCGCTG GCT 
CAAATTCXX^TGGACAGCXXrTAACTtmTC ITCTTT C AGAATAGCACATTCTCCTT CA GAAACX?C^ 
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AAGCCTTT GACAA AGCCyTOG^^ 

GACGCTACCTCCGCCGCTAACAGACCCGCTCTGGGXAGCACAGC^ 

CACATGCAATGGC\CATACGAAGGCCTCCTGAGAA^^ 

ATCACTCCAGCCTCCXrCCrCTt»CAAGCT^^ 

TACGATX^arrCCCCtmxrrGCTCTGCCTCGT^ 

CTTTACCATTCCCTATTOGGATTG^ 

TCGACTG TCTXTC ^^ 

CTgTC^ CATTAACTT TAGG ^^ 

CTTTCTOGGACCgT^^ 

CTt^antX^lTritXCltJACXXTCACCCAATAa 

CCC3kTTGGCCATAACJWGACAGTATAACATGGTGTCCCTO 

CQGACTQQGACAOOT ^^ 

CAACCGATOGCCATAAGTTTGGCTTTTGG 

GACAAACTOX^ACCCATACCATtt^GGTC^ 

OGTC«xxxriwrcTt^nx*CTAC^ 



ATAGCCAACrroATX^GCCTCCRCAATCTGGTCrAC^^ - ^ 

^GGAGAAACGGATACAGAGCCCTCATGGATAAGTCCCTGC^ 
CTOgC*C CAAGAI»TTCA<^AACT^^ 
TOCTW a^ A«Xrr CAACTCCACCCCT^ 
ATCTTATTCXrrTTATCTCTTGCCTCGCCCT^ 
CCAA<ntX^A<XXrrCCGCCAATCACCCTATCIT^ 
AMTCGTCT^ 

Q^ g^TCCT CCTgCTaaGt^TGA 

AAAA<^T<W^TTTCX^CAGACGAATACAra ri TCC CCTCCACCTC 

(g^TCXCTCCCCCTaTCTCCCT^^ 

GTCCCTCACAOCT 

CCATCCXT CTGAATTTCC^ 

CCCGAACACCCTACCTGTtXXrrGTATCTTlAAGAATTTCA^ 

OOGATTCXCGGGOCCTAAClGTACOGAAAGCyUjACTGCrcCA 

AAAGGTATACCGAAGAGGCTTC^ 

ACCGAAATGTTTGTGACAAACTTnXCG(y«GCCA^ 

GTGGOGCGGAAGGOrrACCGCTGAGGCTCCCAA 

T*C*<5AAGG*GACra^ 

« - ™ » — - ATTCGCTAACTGTAGOra 

ACTCCACCA^yCTTTAGGAATACCGTCGA^ 

ATCGGAO <ra 
CGATOCC CTCTTCC T CAGCT 

AAcac TctxxrroaxxCTCAto* * rcocrn j*ocqgaatccattactatgtgt<xatggatgccc^ 

GCCAATOCACA<aCGTCAG QGCTGACAC AAOG^ 

TCCACCACAGfrCAAAAACGCTCnTSTCCATGA^ 

TATCAATATCTATa^CCTCTrOCmrr^ 

Ac ^ xtAoi A ivrrraxi*^^ 

CTCTGCAATACCACAGAQC^TQGCCXrrATCAGAAGG^ 

AaX*^ACCOGTCTACCCTCAC(akAACCGA 1 A lXtJTlJA <XSAAAA<5AATTGCGAACCCCT 

A<»AACTtmX»COC^ACA<mX CCCCCTCC^^ 1U A CCATTACCGATCAC 

TACCGGAAAGAATGTGTCTGACA1 TOTCCC TTlVlliOUXlXJUraACAA 

AOGCTGCCIWrrCOCT^TCAC^^ 

GTCAGAAGGAATATCTTTCyiCCTOtfSCGCTCO 

CCATGGCCATA<XrrATACCACAGCCGAX 

AAAttgOG(»XAATACIttX^^ 

ATOCX^ACCQC^AUjGCTATGCTCGGCACACACACAATGGA 

OGAAAGCACAQGCATGACCCCTGAGAAACTGCXrTt^ 

ATAACTCCCACCAA *^^ 

CXAAAOCCCTCTCCATCAOQCATTCG^ 
AAGACACAACCGCTACCCAAA<5CTCCTTCI<^^ 
ACGAATGCCCTOCaCGGATTCGATAACGCretAC^ 
O QSMMDCCCTO 

AAuim i iCTGACAACUJGATCA<X;fOC>^TATA<X^ 
GOCI TTCTTTOC CCXriCT 

CAGA<nxXXnXXAAACCACACCCACACA<XTCCCC^^ 

CTGCAATTCAATACCTCCCTGGAAGACCCTTACCATACCCAT^ 

CGC^AACGGACGCTCCAGCCTCCTgriUr^ 

TCCACTCCCOCATGAGCT1' ICltjAATGGCACAAAOGCTCrGCCTCACTCCXSCCGCTAACGATC 

A7CrTIGC0GTCTCCCAATCCACAAGGAAAJUUrrATGCCO 

G TT T Tr CC T C GCCATOCTGGTCCTG A TCGCCGTCCT^^ 
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CCTTCAGAAACACACTGGAAGGCTA^ 

GCCACAGGCAAAAAQgTCTCCCATATCICTACCGATC 

GTCCGTGTCCCAGCTCAGGGCTCTT^ 

TCGA<^TCCCTCCACCGATTACTAT^^ 

CTtntXTACACAAACCCTGCCCTCGCCGCTGC^ 

CATGTTCAATGACATTAACATTTACGATCTGTTTOTG 

TCAAGGGACGCTCCGGCACATACTGTCT1WIT A 

CCOCyrCCCCTCCGGCGAAGGCGATCACCATCCXrrrW 

GGCTAACGCTCCX!ATTTGCACAGACGATCTGATTO 

ItXil^TCriTCCCTCXCAG^ 

AGCTATACX^ltXX^ICTGGClt^X^^ 

CCTCATCATTIGCAAtTXX^TTATOGATCCCCT ^ 

TOGACTCCATCTTTGAGCAATCGCTCCACACAACGAA 

AACCCTCCCTTTTTCCAA AACTC CCTGA 

GGCTGCCATTCCCCTCAGCTTTGCCCTCCCCTATTC^ 

CXXXTTCGACGGAGGCAATAAGCATTTCCra 
GTCTACCGATCACCTCnm»CCOCXTAGGCCTGA^ 
AAGAOCCTCACrrTATCTATGG CTATCCCAAAA AGGGACAC^ 
AGGTCOCTCCATAACCTOGCC^ 

TCTCCAAAAAGAGAGTGCATCCOGATTACCTCA^ 

GTCATTCCCATTCXrCACATAOQGACAGATGAACAATGGCTO^ 

QGCTAGGGCTGCCgrCCTCCAACAQCTCC^CAATC 

GAAAGAATTAaaatCAGCTCGAiC A 'm ^ 

CCCAATGCCCCTCCOCTA<»CTC^ 

GCCT GC CTCCGAOGAATGGATGAAGAGATTCA A TCCCCCTGCC^ 

TCCCCTTTCACTCCCAGGAACTC*^ 

CATAGCTATCACC TCCTP TGTCro 

ACGAATGGCTCAQGA<^TACAATGCOGATATCTCCA^ 

GAAGTGAGAGCCGATACCAGA(XX^rGGA«3CGATACAATTG^ 

CATCAGACAGAATATCCATAGCCrCCACCnin-A^ 

CCTTTACXXMGCCGTCOC^^ 

TATCACTCOGCCICTAAGA3TCTPCCTO 

ACAGCTCCCCCA TAGCTCC AGCC ATreGCTCA<aX^ 

GCCCTTGCOT 

TCAOGA^C COCC T G OCr rr C ICCCT TO O» TW 

TCTTTCTGCTCCTGCTCCTC^^ 

AGAAGGTCTCCCCAAqACQGATItX8AT^ 

TGCATAACGCTCl^TATCT 

AGCCTGTCTCTCCACCAAAG CL ' l 1 l^ACCTCACCGTCACXrrGTCAGCCa^QC^^ 
CCCCTGCCCAACTOGATCACA<aCA<nxrTOGOT 
TCTGAGTCOBCCGAAATCCTCCAGGCICTGC^^ 
CTGCGATAGCCTCGAOGATTACAATCACCrOGTGACAC^^ 

CCCCTClXaX^CCCCAAritXV llUXJACAOGCGTCGACGATAGGGAAA<XTCGCa^ 

CAAACCCCTCOCCCTTACTCCCCCOC^^ 

GAAACAGGATTTCTCOCn^GCCTGOCATTOCCATTCT^ 

AlA OggT Cn^TGTA CKXra CC TC 

TGGCTCCACXXJiCACGATOTGACACTGGCTCCOCSCTAC^ 

ATCGCACAACCSCTAGG G CTACCACAACCOC roC Xr^ 

cecACAACCCTCCTCCTam fr raa»^^ 

CCCTAGGGAACTGCCTATCCCTG^GCCTGAQGGACCOGATO^ 

TmtxrrCTXTOQGGC LtrxT i v r ric - iuj fc' rcru A^ 

GCe CGAq tSCT CMXCT CCTC^^ 

OGATOTGATTACCTtn^CCTCCAltXTCAOCnCOCTGTC 

CCACCCCTGaGGCTACCGGAATCACACCra 

TGCCCTCCCTGGAGCGGAGACftGAGCCCTCA^ 

GG"fCTlVrLtJUXV'lVrAtJACAGAGGGAACCATTAAOGTO^^ 

TCACCATTAACCTCATGQAAAACGAATGCTGTCCCCCriW 

ATCAAAAGCT A TCTGG A ACAGGCrAGCACAATCTQGAGCT 

CCTCCCCTCCCACTCCACCAAAACO^TGCCTCC^^ 

AOOTATATCTCCATCTTTTAOGCTCTGAGATACCATAGC^ 

CCAACIGCCTCTGATKntXSQAATCC T^^^ 

TTCTCX^ATACAGAAGGCTCAOTAAWXCTATACCCCTCT 

A<XCTCQAGGATTAOG ATACC CTC GGCACA CTGTt?rAACTC 

CGCTAGGTGTCTGGAAGTGTCCATtTCCGACGGACA^ 
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ATATCXSTCCCCTTrATCCCTCTCTATCA^ 
ACCGCTGAgTCCACCOGAAAGAAAAGGGTCCACXrCTGA 
CTTT CCCA ATCTOCATCACATTC^ 
AACCCGAACCOGAA GGCCCTC ACGCTAOT^ 

OCX^CACGTOWCCTCAAGGTCAGCAATGA^ 

Helanoma cancer Specific Savine Scramble process 

Scramble - Output File 

Scramble version : 0.1 beta, 08/02/1999 

Hum. genes : 10 

Nun. segments : 121 

Segment length : 30 

Segment overlap : 15 

Segments in original order: 



Gene : BAGS 

Segment* : 1 
Offset : 1 
1st Cod on : 1 

A AHAARAVPLA LSAQLLQARLMK BBSPVV S 
GCCGCTAT 



Segment* : 2 
Offset : 16 
1st Codon » 1 

LLQARLMKSBS PVVSWRLBPBD GTALCPI F 

Gene : BAGB 

Segment* : 3 
Offset : 31 
1st Codon : 1 
HRLBPBDGTALCF IFAA 

Gene : GAGB-1 

Segment* : 1 
Offset i 1 
1st Codon : 1 

AAMSWRGRSTY RP RPRRYV B P PBM I G P M R P 

CCCGCTATGTCCIG G ACAGGCAGAAGCACAT^^ 



: 

Segment* t 2 
Offset : 16 
1st Codon : 1 

RRYVB PPBHIG PHRPBQFSOBVB PATPBBG 

AGGAGATACGTGGAGCCTCGGGAAATGATTGGCCCTA 

Gene : GAGB-1 

Segment* : 3 
Offset i 31 
1st Codon : 1 

BQFSDBVBPAT PB BGBPATQRQD PAAAQBG 



Gene : GAGB-1 

Segment* : 4 
Offset : 46 
1st Codon : 1 

BPATQRQDPAAAQBGBDBGASAGQGPKPS A 

GAGCCTGCCACACAGAGACAGGATCOCGCTGCCGCrCA^ 



Gene r GAGB-1 

Segment* : 5 

Offset : 61 

1st Codon : 1 
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PKPBADSQEQGHPQTCCBCB 
^TAGCCAAGAGCAAGGCCATCCCCAAACCGGATGCGAATGC^ 

Gene : GAGB-1 

Segment! : 6 
Offset : 76 
1st Codon : 1 

DSQBQOKPQTGCBCBD6PDOQBMDPPNPBB 
OACTCyCAGCAACaGCGACACCCTCAGACAGGCTC^ 

Gene : GAGB-1 

Segment! : 7 
Offset : 91 
let Codon t 1 

DOPDGQBMDPPHPBBVKTPBBBMRSHYVAQ 

GACGGACCGGATGGCCAAGAGATGGACCCICCCAA 



Gene : GAGB-1 

Segment* : 8 
Offset : 106 
1st Codon : 1 

V K T __ P _ B B B M R 3 H Y v AQTCILffLLMNMCPLlfL 
CTGAAAACCCCTGAGGAAGAGATCAGGTCCCACrATG^ 

Gene : GAGB-1 

Segment # i 9 
Offset : 121 
1st Codon : 1 

TGILHLLMNNCFL 

Gene : gplO0In4 

Segment! : 1 
Offset : 1 
1st Codon : 1 

AA8W5QKR S F V Y V W KTHGBGLPSQPIXHTC 
GCCGCTAGCTGGAGCCAAAAGAGAAGCTZTGTGTATGTG^ 

Gene i gplO0In4 

Segment! : 2 
Offset : 16 
1st Codon : 1 

TWGBGLPSQPI IHTCVYFPLPDHL SPG RPF 
ACTrreGGGCGAAGGCCTCCCCTCCCA^ 



: gplOOInft 
Segment! : 3 
Offset x 31 
1st Codon : 1 

V Y FPLPDHLSP 
GTGTAT 



GRPFHLNFCDPLAA 
TTTCCATCTCAA I TI V I V I Ta^Tr r C l Xj GClGCC 



Gene i MAGB-1 

Segment! : 1 
Offset i 1 
1st Codon : 1 

A A M S L B QRSLHCKPBBALBAQQBALGLVCV 
GCCGCTATGTCCCTGGAACAGAGAAGCCTC^ 



Gene 
Segment! 
Offset 
1st Codon 
B A L 



: MAGB-1 
2 

16 
1 

B A Q Q 



BALG LVCVQAAT3 S S 3 P LVLGTL 



Gene : MAGB-1 

Segment! t 3 
Offset : 31 
1st Codon : 1 
QAATSSSSPLVLGTLBE 



VPTAGSTOPPQSP 
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Gene : KAGB-1 

Segment* : 4 
Offset : 46 
1st Codon : 1 

BBVPTAGSTDPP Q 3 P QGASAPPTTIMPTRQ 
GAGGAAGTGCCTACCGCTtXXnXXACCGATCCCCCTCA^ 



Segment* 
Offset 



KAjGB-1 
5 

61 



1st Codon t X 
Q G A 6 A F P 



TTIHPTRQRQPSBGSSSRBBBGP 
nTAACTTTACCAGACAGACACACCCTAGCCAA^ 



Gene : MAGB-1 

Segment* : 6 
Offset : 76 
1st Codon i 1 

RQPSBGSSSRBB8GPSTSCI LBSLFRAVIT 
AGGCAACCCTCCGAGGGAAGCTCCAGCAGAGA^ 



Gene 
Segment # 
Offset 
1st Codon 
S T 8 



MAGB-1 
7 

91 
1 

C I L B S L P 



ACCACAAGCTGTAT 



-1 

Segment* : 8 
Offset : 106 
1st Codon : 1 

KKVADLVGFLLIi 
AAGAAAGTGGCTGACCTCGTGGGAT 



1CYRAREPVTKABMI>BSVI 

rATAGGGCTAGGGAACCCGTCACCAAAG<XGAAATGCTC^ 



Gene i MAGB-1 

Segment* : 9 
Offset : 121 
1st Codon 1 

ARB PVTKABMI»BSV I KNYKHCPPB IFGKAS 
GCCAGAGAGCCTGTGACAAAGGCTGAGATGCTGGAAA^ 



t MAGB-1 
Segment* : 10 
Offset : 136 
1st Codon : 1 
KNYXHCFP 



I FGKAS BSLQLVPGIDVJCBAD 



Gene : MAGB-1 

Segment* : 11 
Offset : 151 
1st Codon : 1 

BS LQLVFGIDVKBADPTGHS YVLVTCLGLS 
GAGTCCCTGCAA CTG GTCITCG GA ATCGATGTGAAAGAGCC^ 



Gene : MAGB-1 

Segment* : 12 
Offset : 166 
1st Codon : 1 

PTGHSYVLVTCLGLSYDGLLGDNQIMPKTG 
CCCAOWGGCCATAGCTATGTGCTCGTGACATG 

Gene : MAGB-1 

Segment* : 13 
Offset i 181 
1st Codon : 1 

YDGLLGDNQIMPXT G PL IIVLVMIAMEGGH 
TACGATGGCCTCCrGGGAGACAATCAGATrAltJ4X'rAAGA 

Gene : MAGB-1 
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Segnenti : 14 
Offset : 196 
ISC Codoo : 1 

P L I IVLVMIAMBGGHAPBBBI WEELSVMEV 
TTCCTCATCATTCntXrTCGTGATGATXXXrrATGGAA 



Gene : MAGB-1 

Segment* : 15 
Offset i 211 
1st Codon : 1 

A P B B B IWBBLSVMBVYDGRBHSAYGBPRKL 
GCCCCTCAGGAAGAGATTTGGGAAGAGCTCAGOG^ 

Gene : MAGB-1 

Segnenti : 16 
Offset : 226 
1st Codon : 1 

YDGRBHSAYGBPRKLLTQDLVQEJCYLBYRQ 
TACGATGGCAGAGACCATAGCGCTTACGGAGAGCCT 

Gene : MAGB-1 

Segnenti : 17 
Offset : 241 
1st Codon : 1 

LTQDLVQBKYLBYRQVPDSDPARYBPLWGP 
CTGACACAGGATCTGGTOCAGGAAAAGTATCTQGAATACA 

Gene : MAGB-1 

Segnenti : 18 
Offset : 256 
1st Codon : 1 

VPDSD PARYBFLWGPRALABTS YVKVLBYV 
GTGCCTGACTCCGACCCTGCrAGATAC^ 

Gene : MAGB-1 

Segnenti : 19 
Offset : 271 
1st Codon : 1 

RALABTSYVKVLKYVIKVSARVR7FPPSLR 
^TGTGAAAGTGCTCGAGTATGTGAT 



Gene : MAGB-1 

Segnenti : 20 
Offset : 286 
1st Codon : 1 

IJCVSARVRFPP 
ATCAAAGTGTCCGCCAGAGTGAGAT 



PSLRBAALRB 



B B G V A A 



: MAGB-3 
Segnenti t 1 
Offset : 1 
1st Codon : 1 
AAHPLBQRSQ 
ACAGA 



CKPBBGLBARG BALG L V G A 



Gene : MAGB-3 

Segnenti : 2 
Offset : 16 
1st Codon : 1 
B G L B A R G 



BALGLVGAQAPATB 



QBAASS8S 



Gene : MAGB-3 

Segnenti : 3 
Offset : 31 
1st Codon t 1 

QAPAT BBQBAASSSSTLVBVTLGBVPAABS 
CAGGCTCCCGCTACCGAAGAGCAAGAGGCTGCCT 

Gene : MAGB-3 

Segnenti : 4 
Offset : 46 
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S L P T 



Gene 

Segment* 
Offset 



MAGB-3 
5 

61 

1st Codon : 1 

PDPPQSP QGASSLPTTMNYPLMSQSYBDSS 
CCCGATCCCCCTCAGTCXXXXX^UUXX^ 

Gene : MAGB-3 

Segment* : 6 
Offset : 76 
1st Codon : 1 

TMNYPLMSQSYBD S SWQEEEGPSTPPD LB S 
JICCATGAACTICTCCCCTCIGGTCCCAGTCCT^ 

MAGB-3 
7 

91 
1 

B G P 3 T P P 



Segment! 
Offset 

1st Codon 
H Q B B 



DLBSBFQAALSR 



V A B L V H 



AACCAAGAGGAAGAGGGACCCTCCACCTTTCCC^ 



: MAGB-3 
Segnentf : 8 
Offset : 106 
1st Codon : 1 

BFQAALSRKVABLVH 
GAGrTTCAGGCTGCXXTTCAGCAGAAAGGTC 



FLLLKYRARBPVTKA 
VTACAGAGCCAGAGAGCCTGTGACAAAGGCT 



Gene : MAGB-3 

Segment* : 9 
Offset : 121 
1st Codon i 1 

PLLLKYRARBPVT KABMLGSVVGWMQY FPP 
TTCCTCCTGCTCAAGTATAGGGCTAGGGAACCCGTCAC^ 

Gene : MAGB-3 

Segment* : 10 
Offset : 136 
1st Codoa : 1 

BMI * G SVVGH1IQYPFPVIFSXASSSLQLV FG 

CTGGTCTTCGGA 



: MAGB-3 
Segment* : 11 
Offset : 151 
1st Codon : 1 

VIFSKASSSLQ&VFGI'BLMBVDPIGHLYIF 
CTC ATrTTCTCE AAGGCIAGCTCCAGCCTCC A G 



Gene : MAGB-3 

Segment* : 12 
Offset : 166 
1st Codon : 1 

T BLMBVDPIGHLY I FATCLGLSYOGLLGDN 
ATCGAACTGATGGAGCTCGACCCTATCCCACACCTCTAC ^ 



: MAGB-3 
Segment* : 13 
Offset : 181 
1st Codon : 1 

ATCLGLSYDGLLGDHQIMPKAGLLIIVLAI 
GCCACATGCCTCGGCCTCAGCTATGACGGACTGCTCG 



Gene : MAGB-3 

Segment* : 14 
Offset : 196 
1st Codon : 1 
QIH PXAGLLII 



VLAI IARBGDCAPBBKIWB 
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CAGATTATGCCTAAGGCTCGCCTCCT 

Gene : MAGB-3 

Segments : 15 
Offset : 211 
1st Codon : 1 

IARBGDCAPBBKIWBBLSVLBVPBGRBDSI 
ATCGCTAGGGAAGGCGATTGCGCTCCCGAAGAGAAAATCPGGGAG 

Gene : MAGB-3 

Segment* : 16 
Offset : 226 
1st Codon : 1 

B L S V LB V7BGRBDS I LG'DPK JCLLTQHPVQB 
GA<XrrCAGantXTGGAACTGTTTGAGGGAAGGGAAGAC^ 

Gene : MAGB-3 

Segmenti : 17 
Offset : 241 
1st Codon : 1 

L G D P X X L L T Q H P VQBNYLBYRQVPGSDPAC 
CTGGGJWSACCCTAAGAAACTGCTCACCCAACACrrTC 

Gene : MAGB-3 

Segment* : 18 
Offset : 256 
1st Codon : 1 

NYLBYRQVPGSDPACYBPLWGPRALVETSY 
AACTATCTGGAATACAGACAGGTCCCCGGAAGCCATCCCGCT 

Gene : MAGE -3 

Segments .- 19 
Offset : 271 
1st Codon : 1 

Y B F - L " - G P R A L V B TSYVKVLHHMVKISGGPH 
TACGAATTCCTCTGGGGACCCAGAGCCXrrCGTGGAA 

Gene : MAGB-3 

Segmenti : 20 
Offset : 286 
1st Codon : 1 

VJCVLHHMVKISGGPHISYPPLHBMVLRBGB 
GTGAAAGTGCTCCACCATATOGTCAACATTAGCQGAGGCCCTCACATT^ 



-3 

Segmenti : 21 
Offset : 301 
1st Codon : 1 

ISYPPLHBHVLRBCBBAA 

Gene : PRAMB 

Segment ♦ : 1 
Offset : 1 
1st Codon : 1 

AAMBRRRLWGS XQSRYISMSVHTSPRRLVB 
GCaXTATQGAAAGGAGAAQGCTCrGQG GA AGCATrCAGTCO^ 

Gene ; PRAMB 

Segmenti : 2 
Offset : 16 
1st Codon : 1 

YISMSVWTSPRRLVBLAGQS LLKDBAL 
TACATTACCATGAGC G TCTGG A CAACCXCTACGA^ 

Gene : PRAMB 

Segmenti : 3 
Offset ; 31 
1st Codon : 1 

LAGQSLLKDBALAIAALB 
CTGGCTGGCXAAAGCCTCCTGAAAGACGAAGCOCTOGCCAT 



Figure 27 (Cont) 



WO 01/090197 



PCT/AU01/00622 



189/216 

Gene : PRAMB 

Segment! : 4 
Offset : 46 
lat Codon i 1 

ALBLLPRBLPPPLFMAAPDORHSQTLKAMV 

OCCCTCGAGCTCCTXXXrrACGGAACTt^^ 

Gene : PRAMB 

Segment! : 5 
Offset : 61 
1st Codon : 1 

AAPDGRHSQTLKAMVQAHPFTCLPI.GVZ.NK 

CO0GCTTTCGATOG£3*GACACTCOCAGACACTGA 

Gene : PRAMB 

Segnenti : 6 
Offset : 76 
1st Codon : 1 

Q A W P P TCLPLGVLMKGQHLHLBTPKAVLOG 
CACGCn\A^Vrri\ACATGCCTCCC^^ 

Gene : PRAMB 

Segnenti s 7 

Offset : 91 

1st Codon : 1 

^G^O^H^ BTPKAV LDGLDVLLAQBVRPRRHK 

Gene : PRAMB 

Segnenti : 8 
Offset : 106 
1st Codon : 1 

LDVLLAQBVRPRRNKLQVLDLRKMSHQDPW 

Gene : PRAMB 

Segnenti : 9 
Offset : 121 
1st Codon : 1 

LQVLDLRKHSH QDPWTVIfSGHRASLYSPPB 
GTGCAAGTXXTCGACCTCAC<jAAAAACTCCCACCA * 

Gene > PRAMB 

Segnenti s 10 
Offset : 136 
1st Codon : 1 

TVWSGHRASLYSPP8PBAAQPMTKKRXVDG 

ACCGTCTGGTCCGGCAATAGCXXTAGCCTCT 

Gene i PRAMB 

Segnenti s 11 
Offset i 151 
1st Codon : 1 

PBAAQPMTKXRKVDGLSTBA BQPFI PVBVL 
CCCGAAGCCGCTCAGCCTATGACAAAGAAAAGGAAAGTGGA^ 

Gene ' : PRAMB 
Segnenti : 12 
Offset : 166 
1st Codon : 1 

LSTBABQPPI PVBVLVDLPLKBGACDBLPS 
CTGTCCACOGAACOCGAACAGCCTITQiTTC 

Gene : PRAMB 

Segnenti : 13 
Offset : 181 
1st Codon : 1 

V D L P LKBGACDBLPSYLIBKVKRKKMVLRL 
GTGGATCTGTTTCTGAAAGAGGGAGCCIXnX^CGAACT 

Gene : PRAMB 

Segnenti : 14 
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Offset : 196 
1st Codon : 1 

YLIBKVKRKKHVLRLCCKKLKI PAMPMQDI 
TACCTCATCGAAAAGGTCAAGAGAAAGAAAAACGTCCTGA^ 



Sequent i : 15 
Offset : 211 
1st Codon : 1 

CCKKLKIFAMPMQDXKMILKMVQLDSIBDL 

TGCTCTAAGAAACTt»AAATCTTTGC^ 

Gene : PRAMB 

Segment! : 16 
Offset : 226 
1st Codon x 1 

It H I LKMVQLDSI BDLBVTCTMKLPTLAKPS 
AAGATGATXXrrCAAGATGGTGCAACTGGATAGCATT^^ 

Gene : PRAMB 

Segment! : 17 
Offset : 241 
1st Codon : 1 

B V T _ C T * * L * T L A K F S PYLGQMI HLRRLLLS 
GAGGTCACCTGTACCTGGAAGCTCCCCACACTOGCT 

Gene : PRAMB 

Segment! : IB 
Offset : 256 
1st Codon s 1 

PYLGQHINLRRLLLSHIHASSYISPEKBEQ 
CCCTATCTGGGACAC^TCATCAATCTG^ 

Gene : PRAMB 

Segment! : 19 
Offset : 271 
1st Codon : 1 

HIHASSYZ S P B KBBQYXAQFTSQFX.SLQCL 
CACATltAOGCTAGCTCCTACATTAGCCCTGAGAAAGAGGAA 

Gene : PRAMB 

Segment! : 20 
Offset i 286 
1st Codon : 1 

YIAQFTSQFLSLQCLQALYVDSLPFLRGRL 
TACATTGCCCAATTCACAAGCCAATTCCTCAGCCTCC^ 

Gene i PRAMB 

Segment* : 21 
Offset : 301 
1st Codon : 1 

QALYVDSLFPLRGRLDQLLRHVMNPLBTLS 

ouxxrixrr^rArxnt^TAGCxrrii'iTC 

Gene : PRAMB 

Segment! : 22 
Offset : 316 
1st Codon : 1 

DQLLRHVMNPLBTL3 ITNCRLS BGDVMHLS 
GACCAACTGCTCAGGCATGTGATtSAACCCTCT 

Gene : PRAMB 

Segment! : 23 
Offset t 331 
1st Codon : 1 

ITMCRLSBGDVMHLSQSPSVSQLSVLSLSG 
ATCACAAACTGTAGGCTCAGCGAAGCCGATGTGATGC^ 

Gene : PRAMB 

Segment! : 24 

Offset : 346 

1st Codon : 1 



Gene 



: PRAMB 
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QSPSVSQLSVLSLSGVHLTDVS PBPLQALL 
CACTCCCOrrCCGTGTCCCAGC^^ 

Gene : PRAMS 

Se^ent* : 25 
Offset i 361 
let Codoo : 1 

VMLTDVS P BPLQALLBRASATLQDLV 
GTGATGCTGACAGAOCTrcAGCCCTGAGCCTCTtX^ 

Gene : FRAME 

Segment! i 26 
Offset : 376 
let Codoo : 1 

B R ASA T L Q D L V F DECGITDDQLLALLPSLS 
CAGACACCCTCaXX^CACTGCAACACCKX^^ 

Gene : PRAMB 

Segment* : 27 
Offset : 391 
1st Codoo : 1 

GITDDQLLALLPSLSHCSQLTTLSFYGHSI 
GGC^TTACCGATGACCAACTGCTCGCCCT^ i ^ i V CT A VrA TGGCAATAGCATT 



Segment* : 28 
Offset : 406 
1st Codoo : 1 

H C 8 Q L T T L S P Y GMSI SISALQSLLQHL IGL 
CACTGTAGCCAACTCACAACCCTCAGCTTTTACGG^ 

Gene .- PRAMB 

Segmentf : 29 
Offset : 421 
1st Codoo : 1 

SISALQSLLQHLIGLSNLTHVLYPVPLBSY 
AGCATTAGCGCTCrGCAAACCCTCCrCCAACACCTCATC^ 

Gene : PRAMB 

Segmentf : 30 
Offset : 436 
1st Codoo : 1 

S ML T H VLY PVPLBSYBDIHGTLHLBRLAYL 
*CCAATCrGACACACGTCCTGTATCCCC^ 

Gene : PRAMB 

Segment* i 31 
Offset : 451 
1st Codoo : 1 

BDIHGTLHLBRLAYLHARLRBLLCBLGRPS 
GAGGATATCCATGGCACACTCCATCIXXSAAAGGCTCG 

Gene : PRAMB 

Segment* : 32 
Offset : 466 
1st Codoo : 1 

H A R h R B LLC B LG RPSMVMLSAHPCPHCGDR 
CaCGCTAGGCTCAGGGAACTGCTCTGCt3AACIt3GGA^ 



HVWLSAtfPCPHCGDRTFYDPBPI LCPCFMP 

ATGGTCTGGCTCAGCGCTAACCCnat X ^^ 

Gene : PRAMB 

Segment* : 34 
Offset : 496 
1st Codon : 1 

TPYDPBPI LCPCFMPNAA 
A CLTITrA CGATCCOaUVCCC A r^^ 
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Gene : TRP2IH2 

Segment* : 1 
Offset : 1 
1st Codon : l 

AALMBTHLSSKRYTBBAGGFFPWLKVYYYR 

Gene : TRP2IR2 

Segment* : 2 
Offset : 16 
lat Codon t 1 

B A G G F P P W L K V Y Y Y R P V IGLRVWQWBVISC 

GAGGCTGGOGGATTCTTTCCCTGGCTGAAAGTGTATTACT 

Gene : TRP2IH2 

Segment* : 3 
Offset : 31 
1st Codon : 1 

PVIGLRVWQWBVI SCKLIKRATTRQPAA 
TTCGTCATCGGACTGAGAGTGTGCX^GTGG^ 

Gene : HYHSOla 

Segment! : 1 
Offset : 1 
1st Codon ; 1 

AAMQABGRGTGGSTGDADGPGGPGI PDGPG 
GCCGCTATGCAAGCCGAAGGCAGAGGCACAGGCX& 

Gene : HYHSOla 

Segment* : 2 
Offset : 16 
1st Codon : 1 

OADGPGG PGXPDG PGGNAGGPGBAGATGGR 
GACGCTGACGGACCCGGAGGCCCTGGCATTCCCG^ 

Gene : HYHSOla 

Segment* : 3 
Offset : 31 
1st Codon : 1 

GHAGGPGBAGATGGRGPRGAGAARASGPGG 
GGCAATGCCGGAGGCCCTGGCGAAGCCGGAGCCACACXX^ 

Gene t HYHSOla 

Segment* : 4 
Offset : 46 
1st Codon : 1 

GPRGAGAARASGPGGGAPRG PHGGAASGLN 

Gene i HYHSOla 

Segment* : S 
Offset : 61 
1st Codon : 1 

GAPRG PHGGAASGLHGCCRCGARG PBS R L I* 

Gene : HYHSOla 

Segment* : 6 
Offset s 76 
1st Codon : 1 

GCCRCGARGP B S R LLBPYLAHPPATPNBAB 

GGCTGTTGCAGATGCGGAGCCAGAOGCCCTGA^ 

Gene : HYHSOla 

Segment* : 7 
Offset : 91 
lot Codon : 1 

BPYLAMPPATPHBABLARRSLAQDAPPLPV 

GAGTTTTACCrCGCCATGCCCTTTGCCACACCCAT 



Gene : HYHSOla 
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Segment* : 8 
Offset : 106 
lot Codon : 1 

LARRSLAQDAPPLPVPGVLLKBFTVSGNI L 
CTOGCTACGACAAGOCTCGCCCAAGAOGCTC\.CC C ^ 

Gene : NYNSOla 

Segment t : 9 
Offset : 121 
1st Codon t 1 

PGVLLKBFTVSGNILTI RLTAADHRQLQLS 
CCCGGAGTGCTCCTGAAAGAGCTTACCOTCAGC^ 

Gene : NYNSOla 

Segment* t 10 
Offset : 136 
1st Codon : 1 

TIRLTAADHRQLQLS ISSCLQQLS LLMKI T 
ACCATTAGGCTCACCGCTGCCGATCACAGACAGCTCCA 

Gene : NYNSOla 

Segment* i 11 
Offset : 151 
1st Codon : 1 

I S S C L Q Q L S L L NNITQCFLPVFLAQPPSGQ 
ATCTCCAGCTGTCTGCAACAGCTC^GCCTCCT^ 

Gene : NYNSOla 

Segment* s 13 
Offset : 166 
1st Codon : 1 

QC FLPVFLAQP PSGQRRAA 
CA C IG TTTC CTCCCOGTCITCCTCOCCCA 



Gene i HYNSOlb 

Segment* : 1 
Offset t 1 
1st Codon : 1 

AAMLMAQEALAFLMAQOAMLAAQBRRVPRA 

GCCGCTATGCIOITOGCTCACGAAGCCCTCGCCTTTC^ 



Gene : HYNSOlb 

Segment* : 2 
Offset i 16 
1st Codon s 1 
OGAHLAAQBRRVPRAA 



VPGAQGQQGPRGR 



Gene : HYNSOlb 

Segment* : 3 
Offset : 31 
1st Codon : 1 

ABVPGAQGQQG PRGRBBAPRGVRMAARLQG 

ATGGCCGCTAGGCTCCAGGGA 



Gene : HYNSOlb 

Segment* : 4 
Offset : 46 
1st Codon : 1 

BBAPRGVRHAARLQGAA 
GAGGAAGCCCCTAGGGGAGTGJW^AATGGCTGCCAGACTG 



Gene ; LAGB1 

Segment* : 1 
Offset : l 
1st Codon : 1 

AAMQABGQGTGGSTGOADGPGG PGI PDGPG 
GCCGCTATGCAAGCCGAAGGCCAAGGCACAGGCGGAAGCACA 

Gene : LAGB1 

Segment* : 2 
Offset : 16 
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1st Codon : 1 

DAD PGGPGIPDGPGGNAGGPGBAGATGGR 
GACGCTGA<XGACCCGGAGGCCCTGGCATTCCCGATGGCCCT 

Gene : LAG El 

Segment # : 3 
Offset : 31 
1st Codon : X 

GHAGGPGBAGATGGRG PRGAGAARASG PRO 
CGCAAlGCCGGAGGCCCTGGCGAAGOCXX^kGCCACAGGCGGAAGG 



Gene : LAGB1 

Segment* : 4 
Offset : 46 
let Codon : 1 

GPRGAGAARASGPRGGA PRG P HG GAASAQD 

GGGCCTAGGGGAG<33GGAGCC3GCrAGGGCTAGOGGAOCXAGAGGCG 



Gene 

Segnent* 
Offset 
1st Codon 



LAGB1 
5 

61 
1 



GAPRGPRGGAASAQDG 



C P C G A R 



R P D S R L L 

VTAGCAGACTGCTC 



: LAGK1 
Segment # t 6 
Offset : 76 
1st Codon ♦ 1 

GRCPCGARRPDSRLX#QI*HITH PPSSPMBAB 
GGCA(^TGCCCTTGOGGAGCCAQAAGGCCrGACr^^ 



Gene : IMSK1 

Segnent* i 7 
Offset : 91 
1st Codon i 1 

QLH I T H P F S S PMBABLVRRX LSRDAAPLFR 
CAGCTCCACATTACCATGCCCTTTAGCTCCCCCATG^ 



Gene : LAGB1 

Segment f : 8 
Offset s 106 
1st Codon : 1 

LVRRILSRDAAPLPRPGAV LKDFTVSGNLL 
CTGGTCAGCaGAATCCTCAGCAGAGACGCT GC CC^ 

: LAGB1 

: 9 

Offset : 121 
1st Codon : 1 

P G A V L K D P T VSGMLLPIRLTAADHRQLQLS 
CGCGGAGCCGTOCTGAAAGACTTTACGGTCAG 

Gene : LAG81 

Segnent* : 10 
Offset : 136 
1st Codon : 1 

FIRLTAADHRQLQLS I SSC T^Q 
T/ItATTAGGCTCACOGCTGCCX^TCACAGACAGCTCOGCrCAGCAT 



LSLLHWIT 

TGTCCCTGCTCATGTGGATCACA 



Gene s LAGB1 

Segnent* : 11 
Offset : 151 
1st Codon : 1 

1 S 8 -___ J*_-° 0 JL- S L L W it 0CFLPVFLAQAPSGQ 
ATCTCCAGCTGTCTGCAACACCTCAGCCTCCTGATGTC 



Gene : LAGB1 

Segment* : 12 
Offset t 166 
1st Codon : 1 
QCFLPVFLAQA 



PSGQRRAA 
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CaGTCTriTCCTCCCCCTCTTCCT 
Segment s in scrambled order: 
HAGB-1 #15 

A P B B B 1M B B L S V M B V Y DGRBHSAYOBPRKL 
GCCCCTGAGGAAGAGATTTGGGAAGAGCTCAGOGTCATGGA 

MACB-1 #4 

BBVPTAGSTDPPQSPQGASAFPTTIHFTRQ 
GAOGAAGTGCCTACCGCTGGCTCCACCGAT^^ 

PRAMS #10 

T V W SGHRASLYS PPBPBAAQPMTKKRKVDG 

ACCGTCTCGTCCGGCAATAGGGCTAGCCTCrACTCCTT^ 

MAGB-3 #14 

QIMPKACLLIIVLAIIARBGDCAPBBKIWB 
CAGATTATGCCTAAGGCTGGCCTCCTGATTATOGTCCTG 



PRAMB #9 

LQVLDLRKH 
CTGCAAGTGCT 



SHQDFHTVHSGNRAS 



L Y S P P B 
ITAG C 1 TTCC OGAA 



RRWKLQVLDLRKNS HQDPW 
ITCTGAGAAAGAATAGCCATCAGGATTTCTGG 



PRAMB #8 

LDVLLAQBVRP 
CTGGAT 

NYNSOlb #2 
QGAMLAAQ 

CAGGGAGCCAT 

PRAMB #24 

QSP SVSQLSVL3L SGVMLTDVSPBP LQALL 
CaGTCCCCCTCCGTGTCCCAGCTO^^ 

MAGB-1 #17 

LTQDLVQBKYLBYRQVPDSDPARYB PLHG P 
CTGACACAGGATCTGGTCCAGGAAAAOTATCltX^ 



MAGB-1 #6 

R 0 P S 
AGGCAACCCTC 



B G S S S 



RBBBGPSTSCILBS LPRAVIT 
JAGGGACCCTCCACCTCCTGCATTCTGGAAAG 



BAGB #1 

A A MAARAVFLALSAQLLQARLMKBB 



S P V V s 



PRAMB #34 

TF Y D P B P I 
ACCTTTTACGATCCCGAACCCAT 



h C P C F M 



P H A A 

VTGCCGCT 



MAGB-3 #12 

I BLMBVDPIGHLY I FATCLGLSYDG L L G O M 
ATCGAACTGATGGAOGTCGACCCfATCtiUACACCTCT^ 

GAGB-1 #2 

RRYVBPP. BMIGPMRFBQFSD8VB PATPBBG 
AGGAGATACGTCGAGCCTCCCGAAATGATTGGCCCTATGA^ 



TRP2IN2 #2 

BAGGF FPHLKV 
GAGGCTGGCGGAT 



YYYRFVIGLRVHQWBVISC 
kTTACTATAGGTTTGTGATTGGCCTCAGGGTCTGGCA^ 



PRAMB II 

AAM 8RRRLWGS IQSRYI S M 3 V W T S P R R Z* V B 
GCaXTATGGAAAGGAGAACGCTCTGGGCAACCATT^ 

TRP2IN2 #1 

AALHBTHL8SKRYTBBAGGFF PWLKVYYYR 

GCCGCTCXGATGGAGACACACCrCAGCTCCAAGAGATACA rTCCC T TGG CTCAAGGTCTACTATTACACA 
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MAGB-l #1 

AAMSLBQRSLHCKPBBALBAQQ EALGLVCV 
GCCGCTATGTCCCTQGAAOUIAGAAGCC^^ 

MAGB-l #3 

QAATS SSSPLVLOTLBBVPTAGSTDPPQS P 
CAGGCTCCCACW ^^ 

PRAMB 14 

ALBLLPRBLFPPLPMAAPDGRHSQTLKAMV 

GCCCrOGAGCTCCTGCCTAGGGAACICTTTCCCC 

HAGB-3 #16 

B L S - V — L — g V P 8 GRBDSILGDPKICLLTQHPVQB 
GAGCTCAGCGTCCrGGAAGTGTi'lXJAGGGAAGGGAAGACTO 

MAGB-l ill 

B S L Q I» V F Q IDVKBADPTGHSYVLVTCLGLS 
QAGTCCCTGCAACTGGTCTTCGGAATtXATGTGAAA 

HAGB-3 #5 

P P --?-- P _ Q S PQGASSLPTTMNYPLWSQSYBDSS 
CO(^TC<XCCTC»GTCCCCCCAAGGOGCTA^ 

LAGB1 #1 

AAMQABGQGTGGSTGDADGPGGPGI POGPG 
CCOTCTATGCAAGCCGAAGGCXAAGGCACAG 

NYNSOla #12 

QCFLPVFLAQPPSGQRRAA 

CAGTGTTTCCTCCCOCTCTTCCTCXXX^ . 

gplO0In4 «2 

TWGBGLPSQP IIHTCVYFPLPDHLSFGRPF 
ACCTGGGCMXAAGGCCTCCCCntXXAGCCTATC^ 

MAGB-l #7 

STSCILBSLFRAVITKKVADLVGFLLLKYR 
AGCACAAGCTGTATCCTCGAGlTXClt»lTT A ^ 

NYNSOla 91 

AAHQABGRGTGGSTGDADGPGGPGI P D G P G 
GCCGCTATCCAAGCCGAAGGCAGAGCCACAGGCQGAA^ 

GAGB-1 #7 

D G P D O Q B M D P P NPBBVKTPBBBMRSHYVAQ 

GAOGGACCCGATGGCCAAGAGATGGAOCCTCCCAATCCCGAA 

NYNSOla 911 

I S 8Ct QQLSLLMWITQCFLFVFLAQPPSGQ 
AT<nxa^CClHmri>SCAACAGCTCAGCCK^ 

PRAMB 926 

BRASATLQDLVPDBCGITDDQT LALLPSLS 
GAGAGAGCCTCCGCCACACTGCAAGACCTOGTGTTTGACGA^ 

MAGB-3 917 

J* G D__f * % L L T Q H P VQBNYLBYRQVPGSDPAC 
CireCAGACCCrAACAAACIGCTOKXCT 

MAGB-l 92 

BALBAQQKALGLVCVQAATSS SSPLVLGTl* 
GAGGCTCTGGAAGCCCAACAGGAAGCCETCGGCCTOGT^^ 

NYNSOla 97 

B PY LAM P F A T PMBABLARRSLAQDAPPLPV 

GAGTTITACCTOGCCATGCCCrTTtXCACACCCATGGAGGC^ 

NYNSOlb 94 

8 B A PRGVRMAARLQGAA 
GAGGAAGCCCCTAGGGGAGTGAGAATGGCTGCCAGACTGCAAG 
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GAGB-1 #3 

BQPSDBVBPATPEBGE PATQRQDPAAAQBG 
GAGCAATTCTOCGACGAACTCGJUiCCCGCTACCCCTGAGGAAGGC^ 

MAGB-3 #6 

T M H Y P _ L __* SQSYRDSSNQBBBGPSTFPDLBS 
ACCATGAACTATCCCCItrTGGTCCCAGTCCTACGAAGACTCCA 



B-3 #7 

NQBBBGPSTFPDLBSB FQAALSRKVAB 
AACCAAGAGGAAGAOGGACCCTCCACCTrTC^ 



L V H 



PRAKB §13 
V D L P I* K 
GTGGAT 



G A C D B 



LPSYLI EKVKRKKNVLRL 
rrGTTTAGCTATCTtSATTGAGAAAGTGAAAAGGAAAAAGA^ 



BYBSOla #10 

TIRLTAADHRQLQI*S X SSCLQQL5 L L H W I T 
ACCATTAGGCTCACCGCTCCCGATXACAGACAGCTCCAG^ 



MAGB-3 #1 

AAMPLBQRSQHCKPBBGLBARGBALG 
CCXXXrrATGCCTCTGGAACAGAGAAGCCAACACTCT 



L V G A 



NYWSOla #2 

DADGPGGPGIPDGPGGMAGGPGBAGATGGR 
GACGCTGACGGACCCGGAGGCCCTGCCATTCCC^ 

MAGB-3 #19 

YBPLMG PRALVBTSYVKVLHHMVK ISGG PH 
TAOGAATTCCTCTGGGGACCCAGACCCCTOT 



PRAMB #23 
I T H C R I* S 



G D V M H L S QSPSVSQLSVLSLSG 
ATGTGATGCACCTCAGCCAAAGCCCTAGCGTCAGCCAACTG^ 



MAGB-3 #18 

NYLBYRQVPGSDPACY 
AACTAXCTGCAATACAGACAGGTCCCOGGAAGCGAT 



-3 #11 

VIPSKASSSLQLVFGIBLMBVDPIGHLYIP 
QI WTTrTC T OC AAGGCTAGCTCCWGCCTCCAGC 1 OGTC T 1 ItUXATroAGCTCATGGAAGTGGATCCC A TTGGCCATCT 



PRAMB #21 
Q A L Y V D 8 

CAGGCTCTGTATGTGGAT 



L F P 



LRGRLDQLLRHVMNPLBTLS 

^TCAGCTCCTGAGACACGTCATGAATCCCCTCG^ 



PRAMB #20 

YIAQPTSQPLSLQCLQALYVD 
TACATTGCCCAATTCACAAGCCAATTCCTCAGCCTCCAGTC 



L P F L R G 



#7 

GQBLHLBTPKAVLDGLDVLLAQ 
GGCCAACACCTCCACCT CGAgA CATTCAAAGCCGlXX^t»AT 



IAGB1 #10 

PIRLTAADHRQLQLSZ 
TTCATTAGGCTCACOGCTGCCGATCACAGACAGCTCCAGCTCAGCAT 



3 S C L Q Q L 



1* L M W 1 T 



PRAMB #15 

CCKKLKIFAMPMQDIKMXLKMVQLDSXBDL 
TCCTCTAACAAACTGAAAATCTTTGCCATGCCC^^ 



HYNSOla #5 
G A P R G P 



GGAASGLMG C C RCGARGPBSRLL 
ATGCTGTAGGTGTGGCGCTAGGGGACCCGAAAGCAGACTGCTC 
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HAGB-1 |8 

KKVADLVGFLLLKYRARB PVTKABMLBSV I 
AAGAAAGTCGCTGA«TO7rGGGATTC^^ 

MAGB-1 #13 

V D 0 L L G DNQIMPKTG P L I IVLVMIAMBGGH 
T^CGATGGCCTCCTGGGAGACAATCAGATTATGCCT 

PRAMB #29 

SIS A L Q S L L Q H L I G L S WLTHVLYPVPLBSY 
AGCATTAGCGCTCTtXAAAGOCTOCTCCAACA 

MAGB-3 #X5 
IARBGDCAPBBKI 

ATCGCTAGGGAAGGCGATTGCGCTCCX3GAAGAGAAAAT 

PRAMB #22 

DQLLRHVMHPLBTLSITNCRLS BGDVMHLS 
GACCAACTGCTCAGGCATGTGATGAACCCTCrG^ 

MAGB-1 #19 

RALABTSYVKVLBYVIKVSARVRPPPPSLR 
GACAAGCTATCTCAAAGTGCTCCACTATGTGATTAAGg 



PRAMB #30 
SHLTHVXiYP 

AGCAATCTGACACAOGTOCTGTAT 



V P L B 



Y B 



D 1 H G ILJlJ? l b r l A Y L 

^TTCACGGAACCCTCCACCTCGAGAGACTCGCTTACCTC 



WYNSOlb #1 

AAMLMAQBALAPLMAQGAMLAAQBRRVFRA 

GCCGCTATGCTCATGGCTCAGGAAGCCCTCGCCTTTC 

I P GKASBSLQLVFGIDVXBAD 
AITITQJUAAAGGCTAGOGAAAGCCTCCACC^ 



PRAMB #32 

BARLR8LI*CBI»GRPSMVM 

CAOGCTAGGCTOUSGGAACTXyCTCTGOG 



L S A N PC 



H C G D R 
ATTGCGGAGACAGA 



PRAMB #25 

V M L T D V 3 P B PLQALLBRASATLQDLVFDBC 
GTGATGCTGACAGACGTCAGCCCTGAGCCTCTGCA 

GAGB-1 #S 

BDBGASAGQGPKPBADSQBQGHPQTGCBCB 
GAGGATGAGGGAGCCTCOGCCGGACAGGGACCCAAA 

MAGB-3 #10 

BMLGSVVGHNQYFFFVXFSKASSSLQLVFG 
QAGATGCTGGGAACCCTCGTGOGAAACTGCCAGT ATTO 



GAGB-1 #1 
AAMSMRGR 

GCCGCTAT 



STYRPRPRRYVBPPBMIGPMRP 

^TACAGACCCAGACCX^GAAGGTATGTGGAACtXX^ 



PRAMB #2 

YI SMSVWTS PRRLVBLAGQSLLKDBALAIA 
TACATTACCATGAGOGTCTCGACAAGCOCTAGGAGACTGGTCGAGC^ 



MAGB-1 #16 
Y D G R B 

TACGAT 



HSAYGBPRKLLTQDLVQBKY LBYRQ 
«AGCGCTTACGGAGAGCCTAGGAAACrcCTCA^ 



LAGB1 #12 

QCFLPVFLAQAPSGQRRAA 
CAGTGTITCCTCC(XXnX^ritX^OGCCC^^ 
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MAGB-3 #20 

VKVLHHMVKISGGPHtSYPPLHEHVLRBGB 
GTGAAAGTGCTCCACCATATGGTCAAGATTAGCGGAGGCCC^ 

IAGB1 #7 

Q L - H 1 T M P F S SPMBABLVRRILSRDAAPLPR 

CAGCTCCACATTACCATGCXXrrTrAGCTCCCCCATGGAGGCT^ 

HYNSOla #9 

PGVLLKBPTVSGNILTIRLTAADHRQLQLS 
CCCQGAGTGCTCCIGAAAGAGTTTACCGTCAGCGGAAACA^ 

PRAMB #16 

KHI LKMVQLDS IBDLBVTCTWKLPTLAKFS 
AAGATC^TCCTCAAGATGGTGCAACTCX^TAGCATTGAOG^ 



MAGB-1 #14 

V L I I 
TTCCTCATCAT 



VLVMIAMBGGHAPBBBIWBBLSVHBV 
ITGATCGCTATGQAAOGOGGACACGCrCOCGAAGAGGAA^ 



PRAMB #17 

BVTCTNKLPTZ.AKFSPYLGQMIIILRRI.LL8 

GAGGT(.JW^~i\>rjvCCTGGAAGCTC 

HAGB-3 #2 

BGLBARGBALG LVGAQAPATBBQBAAS SSS 
GAGGGACTGGAAGCCAGAGGCGAAGCCCTCGGCCT^ 

HAGB-3 #21 

I S Y PPLHBHVLRBGBBAA 
ATCTCCTACCCTCCCXTTCCACGAATGGGTCCTG^ 

PRAMB #19 

RIHASSYIS PBKBBQYIAQPTSQPLSLQCL 
CACATTCACGCTACCTCCTACATTACCCCT^^ 



HYNSOla #3 

° M A G G P 0 B A __Q ATGGRG PRGAGAARAS 
GGCAATGCCGGAGGCOCTGGCGAAGCCGGAGCCACAGGCGGAAGGGGA 



G P G G 



HGGAASGLN 
lTGGCGGAGCCGCTAGCGGACTGAAT 




HYNSOla #8 

LARRSLAQDAPPLPVPGVLLKBPTVSGNIL 
CTCGCTACGAGAAGCCTOGCCCAAGAC CC TCCCCCTCTGCCT 



PRAMB #5 

AAPDGRHSQTLKAM 

GCCGC4"I4Cl^TGGCAGJVCACTCC X rAG A CACTGAAAGCCAT 



VQAWFFT CLFLGVLMK 



MAGB-1 120 

IKVSARVRPPrPSLRB 

ATCAAAGTGTCCGCCAGAGTGAGAT 

PRAMB 127 

GITDDQLLALLPSLSHCSQLTTLSPYGMSI 
GGCATTACCGATGACCAACTGCTCGCCCTCCItXCT 



GAGE-1 #8 

VKT PBBBMRSHYVAQTGI 
GTGAAAACCCCIGAGGAAGAGATGAGGTCCCACT AT GTGGCTC ^ 



L If 



LLMNNCPLNL 
VTAACTCTTTCCTCAACCTC 



LAGB1 #11 

XSSCLQQLSLLMWXTQCFLPVFLAQAPSGQ 
ATCTCCAGCrGTCT^X^UVCAGCTCACCCTCC^^ 
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PRAMS #14 

YLI'BKVKRKKHVLRLCCKKLKI PAMPMQDI 
TACCTCATCGAAAACGrattGAGAAAGAAAAA^^ 



HAGB-1 #9 

ARB PVTKABMLBSVI KNYKHC 
GCCAGAGAGCCTCTGACAAAGGCTGAGATGCTOGAAAGOGTC^ 



FPEIPGKAS 
ITCTTTGGCAAAGCCTCC 



LAGE1 |8 
L V R R I 



L S R D A A P 



LPRPGAVLKDFTVSOMLL 

VTTTCACAGTGTCCGGCAATCTGCTC 



PRAMK #28 

HCSQLTTLSFYGHS I S I SALQSLLQHLIGL 
CACTGTAGCCAACKjAOUVCCCTCAGCTTTTAOT 



PRAMB #33 

MVWLSANPCPHCGDRTFYDP BPI 



L C P C 



P M P 

rATGCCT 



gpl001n4 #1 

AASWSQKRSPVYVWKTWGBGLPSQPIIHTC 
GCCCCTAGCraSVGCCAAAAGAGAAGCTn^^ 



BAGS #2 

llqarlhk 

ctgctccaggctaggctca: 




HLNPCDPLAA 
kTCTGAATTTCTGTGACXTTCTGGCTGCC 

PRAMB #18 

PYLGQMIWLRRL LLSHIHA SSYI 3 P B K B B Q 
CCCTATCTGGGACACATGATCAATCTCAGAAGGCTCCTGCTC 



MAGB-3 #3 
Q A P A T B B 



Q B A A S S 



SSTLVEVTLGBVPAARS 
^TGGTCGAGGTCACCCTOXXXjAAGTGCCTGCCGCTG^ 



PRAMB #6 
QAWPFTCL 



PLGVLHKGQHLHLBT PKAVLDG 

VTCTGCATCTGGAAACCTTTAAGGCTGTGCTOGACGGA 



PRAMB #12 
L S T B 



A B 0 P 



FIPVBVLVDLPLKBGACDBLFS 
^ TTCCCGTOGAGGTOCTGGTOGACC ICTTOCTC!AAGGAAGG<X3CTTCCGATGACCTCTTCrCC 



HYNSOlb «3 
A B V P 



M A A R L Q G 

TGGCCGCTAGGCTCCAGGGA 




LAG El #4 

GPRGAGAARASGPRGGAPRGPHGGAASAQD 

GGCCCTAGGGGAGCOGGAGCOGCTAGGGCTAGCGGACCCAGAGGCGGA^ 

PRAMB #3 

LAGQSLLKDBALA I A A L B 

CTUGCTGGCCAAAGCCTCCTGAAAGAOGAAGCCCTCGCCAT 



GAGB-1 #4 

BPATQRQDP 
GAGCCTGCCACACAGAGACAGGAT 



AAAQBGBDBGA3AGQG PKPBA 



PRAMB #11 

PBAAQPMTKKRKVOGLSTBABQPPI PVBVL 
CCCGAAGCCGCTCAGCCTATGACAAAGAAAAGGAAAGTG^ 
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IAGBl #6 

GRCPCGARRPDSRLLQLHITHPPSSPMEAB 
GGCAGATGCCCTTGCGGAGCCACAAGGCCTGACTCCAGG 

LAGB1 #9 

PGAVLKDPTVSGNLLFI RLTAADHRQLQLS 

CCOGGACCOCntXnt S AAACACTTTAOCGTCAGCG 

PRAMB #31 

KDIHGTLHLBRLAYLHARLRBLLCBLGRPS 
QAGGATATCCATGGCACACTGCATCTGGAAXGGCTCG 

DGPDGQBMDPPNPBB 

ATGGCCCTGACX3GACAGGAAATGGATCCCCCTAA 

TRP2IN2 #3 

FVIGLRVWQWBVI SCRLI KRATTRQPAA 
TTCGTCATOGGACItyU3AGTGTGGCAGTGG 

LAGB1 #2 

DADGPGGPGIPD G P G GNAGGPGBAGATGGR 

GACGCTGACXX^CCCGGAGGCCCTG^ 

MAGB-1 #12 

PTGHSYVLVTCLGLSYDGLLGDNQIMPKTQ 
CCCACAGGCCATAGCT A TGtG CT aa C ACATCXCTOGG 

MAGE -3 #9 

PLLLKYRARBPVTJCABMLGSVVGNWQYFFP 

yrOCT CC TGCTCAAGTATAOGGCTAOGGAACCOGTCA rTTTTCCCT 

GAGE-1 #9 

TGILWLLMMMCFLMLSPRKPAA 
ACOGGAATCCTCTGCCTCCTGATGAACAATroCTTTC^ 

LVHPLLLKYRARBPVTKA 

TCGTCCACTTTCTGCTCCTGAAATACA^ 

MAGB-1 #18 

VPDSDPARYBPLWGPRALABTSYVKVLBYV 

GTGCCTGACTCOGACCCTGCCAGATACGAATTCCTCTCG 

MYHSOla #6 

G CC R C O A R Q P B g R LLBPYLAMPPATPMBAB 

GGCTGTTGCAGATGCGGAGCCAGAGGCCCTCAGTC 

MAGB-3 #13 

AT CLG L SYDG LLG DHQ I M PKAG L L I I V L A I 

GCCACATGCCTCGGCCTCAGCTATGACGGACTGCTCGG 




Artificial Protein: 



APBBBIWKKLSVMHVYIXniEHSAYGBPRKIjEKVPTAGSTDPPOSPQGASAP 
LIIVLAI UUKGDC^BBXIWEI4VLDUra 
BVPGACX^QQGPRGROSPSVSOLSVl^LSGVMLTDVSPBPI^ 

FRAVITAAMAARAVFIjALSAQIJiOARLMlCBBS PWSTFYDPB PLLCPCFMPHAAI BUtBVDPIGHLYI FATCLGLSYDGLLGDNRRYVBPP8MIGPHR 

PBQPSZ>BVEPATPBBGBAGGPPPMIJCVYYYRFV1GLRVWGWBVZSCAA 

WUCVYYYRAAMSLStftSLHCKPBBAXAATO 

BLSVLBVFBGRBDSIU&PKXIATQHPV^^ 

GTGGSTGDADGPGGPGIPDGPG&TUWI^ 

KYRAAMQABGRglta G ST GD ADGPGG PG I P1X3PGDG PtX^BHDPPNPBBVlCrPBB BMR5HYVAQI SSCLQQLSLI>W ITQCFLPVPLAQPPSCQBRASA 

TLQDLVFDBCSITTOQUJUJ^SL^^ 

ABI*ARRSLAQDAPPLPVBBAPRGVRMA^ 

BB£GPSTPPDLBSMQBBBGPBTPPDLBSBF0^ALSRXVAB1*VU^ 

LMWITAAMPLBOJISQHCKPBBGLBARGEALGLVG^ 

OU^BGDVKHLSG^PSVSOLSVI^LSGNYI^YROVP 
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W3UJ>QUJUiVMNPLBTI*SYIAQFTS^ 

LQOI>SIJJWITCCKXLKIPAHPMQDIKMIIJ<MVOI^SIEDLGAPRCPH^ 

SVIYTCLUa*JQIMPICTGFTiIIVLW^ 

HVWPLBTI^ITOCRI^B(a)VHHLSlU^^ 

KAQGAMLAAQBRKVPRAKNYKHCFPB I FGKASES LQLVPG IDVKBADTLVBVTLGBVPAABS PI>P PQS PQGASS1JTHARIJIKIXCELGRPSMVWLSA 

HPCPHOa>RVMLTDVSPBPI4)AIABRA^ 

GAAMSWRGRSTYRPRPIUtYVBP PBMIG PMRPTC 

IA(^SGQRRAAV1CVLHHMVKISGGPHXSTO^ 

LQl^»aiJWVQU)SIEDLBVTCI10a*PTIJUCP^ 

AKGEAliGLVCAQAPATBBQRAASSSSISYPPLH^ 

ASGPGGGPftGAGAARASGPGGGAPRGPHGGAASGUfQGASAFPTTINFT^ 
FPGjWSQTIJOtttVOAIfPFrCLPMWJro 

CFPBircKASLVIttII^RnAAPLPRP<^VIJ^ 

MPAASKSQKRSFVYVWKTWGEGLPSQPI IHTCLLQARLHKBBS PWSWRL8PBDGTALCPI PVYFFLPDttLS FGRPFHI^PCDPXAAPYIiGQMIHLRR 

LLI^IHASSYISPBKKEQOAPATKBQBAASSSSTLVBVTIiGBV^ PVEVI.VDLP 

UCEGACDolJ^SABVPGA<X3QCX3PRGRBKAPRG\W 

AQPLAGQSLIJCDEAIAXAALBI^PREI^PPIiPKEPATQR^ 

GARRPDSRI*LQIlIITWPFSSPMKAEP<^VIJro 

CBDGPDGQBMDPPNPBBFVIGLRWQWKVISCia.1 KRATTRQPAADADGPGGPGI PDGPGGNAGGPGBAGATGGRPT6HSYVLVTCL6LSYIxnJX33H 

QIHPKTGPIXUCyRARBPVriCABHUSSVVG 

BFI^PSAIAOTSYVJCVLBYVGCCRC^^ 

RGAGAARASGPRO 



Artificial DMA: 




GCCCCTGAOGAAGAGATTTGGGAAGAGCTCAGCGTCATGGAA 
GCCTACOGCT QGC TCCACCG AT CCCCCTr W 
ATAGGGCTAGCCTCTACTCCTTCCCTGAGCCTGAG 
CTt^TTATCGTCCTQGCTATCATTGCCAGAGAGG^ 

CCAAGA lTlTltSG ACAGTCTTXMWG^^ i CC OGIUVCTGGATGTCCTCCTGGCTCftOGAAOIt^ 

OGAAOCTCCAGC»rCCTGGATCTCAGAAAGAATACCCATCA0 

GAGGTCCCCXX»CCCCAAGGCCAAC^^ 

TTCAGAGCCGTCATCACAGCCGCTATGGCTCXrCAGAGC^ 
CGTOGTGTCCACCTTTTACl^TCCCGAACCCAT^ 
TCTACATTTTOGCTACCTCTCTGGGACTGTCCTAOC^ 
<XCGAACAQTTrAGCGATGAGGTCGAGCCTCCCACAC^ 
GATTGGCCTCAGGGTC7t?GCAATGGGAAGTGATTAGCI(^^ 
CCGTGTGGACCTCCCCCAGAAGGCTCGTGSGAAGCC^ 
TGGCTCAAGGTCTACTATTACAGAGCCCCTAT 

KCTGGAAGAOGTCCCCACAGCOC3GAAGCACAGACCCTCOCC 
ATAGCCAAACCCTCAAGGCTATGGTC 
GAGCTCACCXTrCCTGQ J UV G ^ ^ ^ 
GCAACTGCTCTTCGGAATCGATCTT^AAC^ 
CCCCCCAAGGCXX^CCTCCCrGCCT A C^^ 
GXa^CAGGOX3VAGCACAGGCGATG^ 

GCTTTGQCACACCCTTTA<X3tf3lAGCTg 
AAATACAGAGCCGCTATGCAAGCCGAA<^^ 
AGACGGACCCGATGGCCAAGAGATGGACCCTCCCA^ 
GCTGTCTtX3U^C^GCTCA<XXnCCT(^T^^ 

OUXX!AACACTTICTCX3UiCg^GAAT^ 

GCTGAGCTCGCCAGAAGGTCCCTGGCTCAGGATGCCOCTC^ 
TGCCTGGAGACltSGAACCOGAAGACGGAACOSCTtTl^^ 
GOGAACCCGCTACCCAAAQGCAAGACCCTCCC CC TGCCC^ 
GAAGAGGAAGGCCCTAGCACATTCCCT^^ 

TCTGTCCAGGAAAGTGGCTGAGCTOGTGCAIGTQGATCTGT^ JA GCTATCTGATTGAGAAAGTGAAAA 

QGAAAAAGAATGTGCTCAGGCrCACCATTAOGCTCACO^ 

CTCATGTGGATCACAGCCGCTATGCXrrCTCGAACA 

ACGAATTCCTCTGGGGACCCAGAGCCCTCGTGG^ 

AC»GCTCC CCQSMgCCT 

GCCTCCAGCTOGT(nTTGGCATTGAGCTCATGGAAGlXKATC^ 
AGAGGCAGACTGGATCACCTCCTGAGACACGTC^ 

AOGTCCTGCTOGCCCAAGAGGTCAGGCCTAGGAGATGGAAAT^ 

ATGTCGATCACATGCTGTAAGAAACTGAAAATCTTTGCCATGC^ 
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CCGAAAGCAGACTGCTCXAGAAAGTC 

TCCGTGATTTACGATGGCCTCCTGGGAGACAATCAG^ 

TAGCATTAGCGCTCTGCAAAGCCTCCTGCAAC^^ 

GGCtfUlGGCGATTGCGCrCCCGAAGAGAAAATCTGG 

CATGTGATGAA CC CTCTGGAAACCCTCAOCATTTtCCAAT^ 

TGTGAAAGTGCTCGAGTATCTrGATTAAGGTCAGOCCT 

CCCTCGAGTCCTXCGAAGACATTCACGGAACCC^^ 

ATGGCCCAAGGCGCTATGCItXXX^GCraVGGAA 

(XOVAAGCCCTCAGGGACCCTCCA^^ 
AATCCCTGTCCC CATTSOGGAGACAGACTGATGC^ 

GCGAATGCXJAAGAflATOCTGGGAAOOGTCGTGGGAA^ 

QGACCCtXTTATGTCCTCGAGAGGCAGAAGCA^ 

TAGCATCAGCGTCTCXiiACAAGc^^JttGG^ 

A<XATAGCGCTTACGGAGAGCCTAOGAAACrGCTCA^ 

CTCGCCCAA<XXXCTAGCGCMA<»^ 

GCATGAtyTQGGTGCTCAGtXSAAOGCGAACAGCTC^ 

ATGCOtXTCCXCTCCCCAGACCCGG^^ 

CIXSCAACTOTCCAAGATGATCCTCAAGATGGTGCAA^^ 

CTCCTTCCTCATCATTgreCIXCT^ 

TCACCTGTACCTOGAAGCTCCCCACACTGGCTAACrT^^ 

GCCACyUKXXUlACXXXTOOGCCT^^ 

CX5AATGGGTCCTGAGAGAGGGAGAGGAA<XXXX7rCACAT^ 

GCCTCOOGCCCTGCXXWGGCCCTAGGGGAG^ 
CGGACTGAATCACXXtAGCCTCOG C C rrTCCXIA CA^ 

TTOGATGOCAGACACTCCCAGACACTGAAAGCCATGGri^ 
CAGAGTGAGATTCTTTTTCCCTAGCCTOWCSGG^ 
TCCTGCCTAGCCTCAGCCATTKrra 
CTCCCTCAGAOIGGCATTCTGTQGCTCCTCATGAAT 
CCAATGCTTTCT^CTGTGTTTCTGGC^^ 
AAAAGCrCAAGATTTTCCXTTATGCCTATGCAAGACAT^ 
TGCTTTC 



ATCCTCAGCAGAGAC 
CAOVgreTCCGGCAATCTtXrKXaC^ 

ATCTCATTOGCCTCATGGTCTTX SC rCXCX^ I x XA? TYt 

ATGCCTGCaxrrAGCTQGAGCOU^AAGAGAAGL'l^ 

GCTCCACGCTACGCTCATGAAAGAGGAAAQCCCTGTQ^ 1 rTGTGTATTTCT 

TTCTGCCTGACCAIi:ut,TLt;i^^^ 

CTCCTCCTCACCCATATCCATCXXnrCAGCrAT^ 

CACCACACTQGTCaAQ(m3tfXXrTOCXX^^^^ 

AGCATCTtX!ATCTQGAAACCTTTAAGUCTtntXriXX^ C1TC 

nt^XTRXXXXTTCAGGGACAGCAAGGCCCTAGGGGAAG 

3TGTCCCTGTGGCGCTAGGAGAC 

GCTCAGGAT 

GGAGCCTGCCACACACAGACAGGAT 
CCGCTCAGCCTATGACAAAGAAAA«iAAAGTGGATGG 
GGAGCCAGAAGCCCTGACTCCAGGCTCC^ 
TACCGTCAGOGGAAACCTCCTGTITATCA^ 
GGCTOGCCTATCTGCATGCCAGACTOAGAGAGCTCCTC 
TGTGACX^TGGCCCTOACGGACAQGAA^ 

CCGGAGACGCTGGCGCT 
CAAATCAT 

ATACTTTTTCOCTACOOQAAT 



iTACCTATtyrGCTOGTGACATCCCTOGOCCTCAGCT 

^TGCTCGGCTCCGTGGTOGGCAATTGGCA 
ATTGCTTTCTt^ATCrCTCCCCCAGAAAGCCTGCCGCTC 
GCAGAAAGGTCX?CCXSAACrCGTCCACTTTCTGCTCCT^ 
GAATTCCTCrGGGGACCCAGAGCXXTTCGCCGAAA 
CAGGCrCCTGGAATrCTATCTGGCTATGCCTTTOGCTA^ 
ACCAAATCATCJCCCAAACCCCGACTGCTCATCATTOT^ 
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Cassettes for construction of a full-length HIV Savine 
Cassette Al 

gga t ccaccATGACAG<X:CCTTGCACM 

(XX!AACTGCTCCTGAATGGCTCCCTGAOAAGCCT 

TC^CGTCAGGGACACAAAGGA^ 

TCXAGCTCCTTCAACTTTCCACAAATCACACre 

CCGATATGGTGATTTACCAGTACATGGACGA^^ 

ACCCX3ATAAGAAA<^CCAAAAGGAACCACX!ATTCCTCTO 

CAGCCTCTTAATTTCCCTCAGATTACaTO 

AGGCTCTGCTCGAGACAGGCTCCTATGGCAGAAA^ 

TCACCAATACXICTATCTCTGAGaUVC^ 

GAGTTTTCXIAGCGAACAGACAAGAGCCAATAGC^ 

TCATTCTCGGATCTGGCACa^AAAACGCCGCT 

GATTGAGGAGGTGCAAAAGAAAAGCGAGOVAAAGAO^^ 

GGAAGGCAAAAGATTATCTCCXriX^CAGAGACAACCAA 

CCACACTGTTTTGCXXXAGCGATGCCAAAGCCTO 

CCCCXXTTGACGATACAGTGCTGGAGGAGATGAACCT 

GGATTCATTAAGGTGAGAAAAATCGGACCCGAAAA 

CCACCAAATGGAGAAAGCTCGTGGATTTCAG 

CTCCGAAGGCTCCAGGCAAACCMAAAG 

AGACTCGTCAACGGATTCTTAGCCXriX^ 

CCGTCTACTATGGCGTCCCCGTCTGGAGAGAGGC^ 

TGCCATGGCTGGCAGAAGCGGCGGCACAGACGAA^ 

TCX^CXXTITACXXriTCCGCT^ 

AGAAAGCCAAAGGCTGGTICTATAGGCATCACT^ 

CAAAAAGGAAAAGGTCTACXrTATCATGGGTACCA^ 

ATTATCAAAATCCAAAACTTTAGGGTCTACTAT^ 

AGGAAATCTGGAACAATATGACATGGATTGAGTGGGAGA 

TCTGAAACCCX^CCOICAGCCCCTCCC^^ 

GAGCAAAAGGATAAGGAGCAATACGATCAGATTCTTATTG^^ 

TGGGACCTACCCCTGTGAATATCATTGGCAGAATTTACGAAACCT 

GATCAGAATCCTCCAGCAACTGATGTTTATCCATTT^ 

AAAGGTCrOXX!ATTAGCCACGGAAGGAAAAAGAGA 

TGGACCTCCAAGCTGGAGCCTTGGAAACACCCTG 

GTGCCCTAGCGAAGAGACAACCCCTAGCCAGAAACAGGAACA 

CTCAAGTCCCTGTTTGGCAATGACAATTTCAATATO 

TAAAAACTTCAGAAAGTATACCGCTTTCACAATC 
GGCCAAGTGAATTGCTCACCAGGCATTTGG*^ 
AGCAATGGCCTCTGACAGAGGAAAAGATTAAGGCTCT^ 
TAGCATGGATGACCTCTACXn*CGGCTCCGACCTGG 
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AGATTGGCCAACATAGCIACCAAAATCGAAGAGCT 
CCAAAAGACTCAGCTCCAAGCTATCCATC^ 
CCCGCTGAGACTGGTCAAGAGACCGCXrrTTTT^ 
CAGACAATGGCAGGACAAAGATTGAGGAACTCAGACCGCAT^ 

AAGAGACGCAGAGAAAATCACACAATGAATG<X:CATACTGC^ 

AGGAACTGCTGGAGCTCGACAAATGGGCAAGCCTC^ 

AGTGTCCCAGAATTACCCTATCGTCCAGAATGTCCAAG^ 

GGACTGAGAATCGTTTTCGCTGTGCTCM 

CCCTCCCCCTCATCCATCrGCAArACTTTGACTGT^^ 

AGTGAGAAGGAGATGCGAATACGCTGTGGGACTCGGAGC 

ATGGGCGCTGCCTCCATGACACrGACAGTC 

AGGGCCAGGGTCAGTGGACATTTCZAGATTTTCCAA 

CGTCAACATCATCGGAAGGAACATGCTGACAC^ 

GCTATCTTTC^GTCCAGCATGCCACAGATTCTGGAGCCT 

ATCCTAGCCCTCTGJVCATTCGGATGGTGT^ 

GGGCGAAAACAATTGCCC(XTGTTTAGGAAATACACAGCCTI^ 

ATTAGGTATCAGTATAACGTCCTGCCTCAGGGATGGGGAAGCAC^ 

AGGCTAGGCTACTGCTCAGCGGAATCGTCXZAGCAA 

GCCTGTGCATCXXX5TCTACTACGATCX!CTCCAAGGAT^ 

TCCACCATGGTGGATATGGGAAACTACGACCTCGGAGTGGACAATAA^ 

TCATGTTCATTCACTTTAGGATTGGCTGCCAGC^ 

CAGGAAAAAGGGATGCTGGAAGTGTGGCAGAGAGGGACACCAG^ 

CTCGGAAAGGATGCCAGACrGGTTATCAAAACCT 

ATGGCGTCAGCATTGAGTGGAGGATAAGGGAAAG 

GCTCAGCACATTGGTGGACATGGGCAATTACX^TC^ 

ATCXSAAGAGGAAGGCGGAGAGCAAGGCAGAGGCAGAAC^^ 

ATGAGGGAGAGAATAACTGTCTGCTTCACC 

CGATATCAAAGTGGTCCXTCAGAAGGAAAGCCAAAATCATTAGOT 

GTGGCCAGCTICrCITCCGAGCAAACAGG 

ACAGACAGGGAACAAGCTCCAGCTGTTTCAAT^ 

CAAGAAAGGTTGTTGGAAATGCGGAAAGGAAGGCCATCAA^ 

GGCAAAATCTGGCCCTCCAACAAAGGCAGACCCGGA 

TTATCATGATCGTCGGTGGACTGATTGGCCTC^^ 

CCX3AGACCTCX3ATAAACATGGCGCTATTACAAGCTCCAA 

GGACAATCGTCTACATFGAGTATGTCGACtgaaga t c tgaat t C 
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A2 fragment 

ggatccaccATX^CAGGCCCTTGCACAAAtt^ 

CCCAACTGCTCCTGAATGGCTCCCTGAAAAGC 

TGAGGTCAAGGACACAAAGGAAGCCCTCGACAAAATC 

TCCAGCTCCATCAACTTTCCAC^^ 

CCX3AAATGGTGATTTACCAGTACATGGACGATC^ 

ACCCGATAAGAAACACCAAAAGGAACCACCATTCCTCTGG^ 

CAGCCTTTTAATTTCCCTCAGATTACCCTCTC 

AGGKTTCTGCTCXSACACAGGCTCCT^ 



GAGTTTTCCAGCXJAACAGACAGGAGCCAATAG 
TCATTCTGGGATCTGGCACX)!AAAA 

GATTGAGGAGGAGCAAAACAAAAGCAAGCAAAAGACACA^ 
GGAAGGCAAAAGATTATCTCCCTGACAGAGACAACC^ 



CX:CCXXrrGACGATACAGTGCTGGAGGAGATGAACCTCCC^ 

GGATTCATTAAGGTGAGAAAGATCGGACCCGAAAACCCTTACA 

(XIACCAAATGGAGAAAGCTCGTGGATTTCAGAAT^ 

CTCCGAAGGCACCAGGCAAACCAGAAAGAATAGGAGAAG 

AGACTGGTCAACGGATTCTTAGCCCTCGCC^ 

CCGTCTACTATGGCXrrcCCCXSTCTGGAGAGAGGCT 

TGC(^TGGCTGGCAGCAGCGGCAGCACAGAC^ 

TCCAACCCTTACCCTTCCGCTAGTATGAAAATCAGAACC^ 

AGAAAGCCAATGGCTGGTTCTATAGGCATCACTTT^ 

CAAAAAGGAAAACXnXTTACCTATCATGGGTACCAO^C 

ATTATCAAAATCCAAAACTTTAGGGTCTACTATA 

AGGAAATCTGGAACAATATGACATGGATTCAGTGG 

TCTGAGACCXXSAACCCACAGCCCCirc 

GAGCCAAAGGATAAGGAGCAATACGATCAGATTATra 

TGGGACCTACCCCTCTGAATATCATTGGCAGAATIT 

GATCAGAATCCTCCAGCAACTGATGTTTA 

AAAGGTCTCGGCATTAGCCACGGAAGGAAAAAGAGAAAACA^ 

TGGACCX^CAACCTGGAGCCTrGGAAACACCCTGGCTTC 

GTGCCCTAGCGAAGAGACAACCCCTAGCCAGAAACAGGAAO^ 



TCTCACTATGGGACCAAAGCCTCAAGCCTTGCGTCA^ 

TAAAAACITCAGAAAGTATACCGCTTTCACAATCCCT 

GGCCAAGTGAATTGCTCACCAGGCATTTGGC^ 

AGCAATGGCCTCAGACAGAGGAAAAGATTAAGGCTCT 

TAGt^TGGATGACCTCTACGTCGGCTCCGACCTG 





kTGCCAAAGCCTATGACACAGAGGTCCACAATGTGTGGG CCACACACGCTTGCGT 




^TGACAATTTCAATATGTGGAAGAATAACATGGTGGAACAGATGCA 
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CACXTTCCTGAGATGGGGACTCACCGACAC 

ACTCCGGCTTAGAGGTCAACATTGTGACAGACATTCCCG^ 

ACTGGCTGGCAGATGGCCIXnXJTVGAATCATTCACACA^ 

CTGCTCAAATGGGGCTTCACAACCCCTGACAAAJ^^ 

TGJVCAGAGGATAAGTGGAACAAACCCCAGAAAATCA^ 

CACAGAGTCCCAGAATCAGCAAGACAGAAACGAAAAGGAACTC 

TGGTTTAACATTACCGACACCGGAAGTAGCTCCCAAGTGTCrc 

AAATGGTCCACCAACCCATCTCCCCCAGACTCGTCXX^ 

GGTCAGGCAAGGCTATAGCCCTCTGTCCI^^ 

GACTCCACCATTAGGAGAGCCATCCTTGGACACAGAGT^ 

TGTTCCTTGGCTTTCTGGGTGCCGCTGGCT 

CCCTAGCAAAGACCTCATTGCTGAGATTCAGAAAG^^ 

TTCAAAAACGGAACCGTCCTGGTCX^ 

GCACCCTCAACTTTCCCAlTAGCAAAGGOUKrCC^ 

TAGGAAACAAAACCCTGACATGGTCATCTATCAGTATCCT 

CCXTGTGGACCCCAGCX3AAGTGGAAGAGACCAACAAGGGCGAA 

TTACCATTCCCTCC^CCAATAACX3AAACrc 

CACAATGGGAGCCXJCCAGCATGACCCT 

AATCTCCTGGAGGAGAATAGGGAAATCCTCAAAGATC 

TCGCTGAAATCCAAAAGCAAGGCACAGAGGAACTGTCCGCCro 

CAATAACCTCGCCGCTATTAGAATCCTGCAACAGCT^ 

ATTGGCATCATtTCGTCAGAGAAGGGCCAGAGCTCCCATO 

AGATGAAGGATTGCACTGAGAGACAGOCTAACTTTCTGGGAAA 

ACTGCATACCGGTGAGAGAGACTGGCACCTCGGC 

GATAGOXJCAACGAAAGCGAAGGCGACAGAGAAGAGCTC^ 

GCCCTGCCXrCXAGGGGACCCGATAGGCTGGAGAGAATCX^^ 

CAGGCTCGTGAATQGCAQTGAGGGCGAGGAAGTCA^ 

CATGGCATGGAAGACGAAGACAGAGAGGTCAATAGCGATAT 

GGGATTACXXjAAAGCAAATGGCTGACGATGACT^ 

TGCAAGCAjGAAAGCTGGGAGACGGAGGCGGAGCCC^ 

gagggacacattgccaaaagctgtagggcccctc^^ 

tgaaagactgtaccx3aaaggcaagccaatttcctcxxk^ 

tctccaaagcaaatggctctggtatat^ 

ccgcktccaataaccctgac^^ 

CCTTATCGTCGCCXrrCATC^TAGCCATTGTGG^ 
ttc 
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Bl fragment 

gg a t ccac cATCXTTCGAGAATATGCIX^ 

TOCCTQTGAAACTGAAACCCGQAATGGAT6GCGCCGCCACCTTT 

GAGAAGCGAACTGTATAAGTATAAGGTCGTG^ 

GTCAACACACCCCCACTGGTCAAGCTATGGTATC 

TCAACACGATCXntSAATCTTGTAGGAGGC 

CTCTCntrCrGTTTCTGGATGGCAT^ 

GCCAACGACTTTAATCTGATGAAGCATCTCGTCTGGGCC^ 

TGCTGGAGACATCOyVAGGCTGTCAGCAAATTG^ 

TGTCAAAACCATTATCGTCX1AACTCAACGAAA 

GGCAAGCTGGA£^X!CT(XX3AAAAGATTAGG 

GCCIX^GGGACTGGTTTACTCCAAAAAGAGG 

TAGATGGGQAACC^TGATCCTCGGCTTGGTGAT^ 

GGAGTGCCTGTGTGGAGGAGACAGCTCCTGTCOT 

CCCAAOUX^TCTGCTCCAGCTCACCG^ 

CATCTATGAGACATACGGAGACACATGGGCGGGAGTGGAAGCC 
CCTCrcCTCXX^TCCGTGAAAAAGCTCACCGAAGACAG 
GGATTATCGATATCATTGCATCCGACATTCAGAC^ 
TGTGTTTATCCATAACITT^ 

GCCACCGATATCATTCCCGTGGGCXjAAATCTATAAGAGA 

ATCTACCOnX^GCATTCTGGATATCAQM 

TCCCAGAGGCCCTGACAGACTCGGAGGCATTGAGGAA^ 

CCTCTGCCTCAGACAAGGGGAGACAATCCCACAGACCCT 

TGAATAAGGAACTGAAAAAOATTATCXX5ACAGGTCAGG 

TGCCATGCAGATGCTCAAGG^^ 

ccaOTCccccixrrcAcc^ 

CCTATAACACACCCATCITTGCCAT^ 

CTTCATTCACAATTTCAAAAGGAGAGGCGGAATCXX 

CACTTTAGGGAGCTCAAC^ 

ACCTCAGGAGCCTGTGTCTGTTCAGCTATCACA 

TAGCAGAATCGGC21TCACTAGGCAACCT 

GACCCCATTCCCATTCACTATTGCGCTCCCG 

AAGAGGATTGGCATCTGGGACAGGGAGTGTCCATCG^ 

CCTCGCCGATCAGCCTAGCCTCTATCCKXICT 

GCCGCTAGAAGGGCTATCCTCGGCCATATAGTCA 

CCCTGCAATACCTCGCACTCAGTCAACCCACAACCGCT^ 

TCAGGTCTGCTTCCTGAAGAAGGGACTGGGAAT^ 

AQCAGGCAAGACXSAAGACGCAGCCAAGTACCATAGC^ 

TamZCCTAAGGAAATCGTC^^ 

TGAAGCCGTGAGACACTTTCCCAGACCCTGGCTGC^ 
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TCCCTX3AAACCCIX3TGTGAAACTGACACCCCTCTGCGTCA 
ACTCC^CCCAAGTGGACCCa^TC^^ 

AATCCATCCXATCXXKXAACACGGAATGGAGGATGAGGATA^ 

GCTCTCAGGCATATCGCTTCTAGTCCT 

TGAAACACTOGCCCCTCACCGAAGAGAAAATC^ 

GCAGTCaUSGCCTGAGCCTACCGCACC^ 

TTTTACAGACACC^TTACGATAGCCGACAC^ 

CTTGCCAAGCKXTTCGGCGGACCCAGTCACAAA^ 

CATTCCTCCC^TTGTGGCOVAAGJVGATTG^ 

CAGGTGAACTGTAGCCCTTCCGAGGGAACAAGACAGACT 

GGCAAATCCACTCCATCTXICGAGAGGATTCTGGGAC^ 

AAGCA(^CTGCAAGAGCAAATCGCATGGATGACAAGCAATCCCCCT 

AACCCTCAGTCCCAGGGCGTCXxIGGAAAGCATGAACAA 

ATCTCTGGGTCTACCATACXZCAAGGCrATTTCCCTGACrc 

TAGCAGAGAAAGACAGAGACAGATTCATTCTATTAACGAATGG^ 

CCIGTGCCTCTGCAACTGTATAAGACACTGAGAC^ 

CACTGCTCGTGCAAAACGCTAACXXrn^CTGTGA^ 

CXX3AAACGAAO?lQGTGGACAAACTGGTC7U3 

GAAAACX^X!ACCX5AGAACTTTAACAT6TGGAAAAACX^TA 

TGAAATGCAATAACAAAAGGTTCAACXXSAACTGGACC^ 
AGAGCTCAAGAATAGCGCTATCTCCCTXJCTCAAC^CTACCGCT 

GAAGTGGTTCAGTCCCGGCATCCCAAAGTGTCCA^ 

GGACATACTGGGGCCTCCACACAGGCGCTGCrATGG 

AGTGAGAGAGAGAATCAGACAGACACCCCCI^^ 

GGTGCCCATACXyUVTGACGTCAAGCAACTGACAGAGGCTC 

TGAAATACrrGGGGGAATCTGCTCCAGTACTGGGTC 

TVGCCATTQAGCTGCCTGAQAAAGAAAGCTGGACCGTC^ 

TCCCAGATTTACCCCGGAAQAGCXJATTGAGGCTCAGCAACACATC 

TGCAAGCCAGAGTGCTCGCCATTGAGAGATACCT^ 

TAGCCAATACGCTCTAGGCATCATTCAGGCTC^ 

ATTTACAAGATCCTCACCGAATCTCAAAATCAAC^ 

AGAGAAGGGTCGTGCAAAGGGAAAAGCGTGCCGTCGGCATTGGCGCT 

ACCCAAAATGATCGGAGGCATTGGAGGCTJTATCAAAGTC 

AACAAGGCTATCTCCTACCATAGGCrCAGGGATTO 

GCTCCCTGAAAGGCCTCCAGAGAGGCACACTGAATGCC^ 

AGTGATTCCCATGTTTTCCGCTCTGTCGGAGGGAGCX^ t c 
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B2 fragment 

ggatccaccATGCTCGAGAATATGCTCACCCAAATC 

TGCCTQTOAAACTGAAACX!aX^TGGATTC 

GAGAAGCGAACTGTATAAGTATAAGGTCGTGAAGATTAAGCCTC^ 

GTCAACACACTCCCACTXXnTM^^ 

TCAACACGATGCCGAATACTCTAGGATO 

CGCTGTCCTXn^CTGGATGGCACT 

GCCAACGACTTTAATCTGATGAAGCATCTCGTCTGGG^ 

TGCTGGAGACATCCGAAGGCTGTAAGCAAATTGCI^^ 

TGTCAAAACCATTATCGTCCACCTCAACX^ 

GGCAAGCTGGACGCCTGGGAAAAGATTAGGCTCAGGCCTC 

GCCTGGACXXSACTGATTTACrCra 

TAGATGGGGAACCTTGATCCTCGGCTTGGTGATTATC^ 

GGAGTGCCTtnxntXSAGGAGACAGCT 

(XCAACAGCATCTGCrayVGCTCJVCra 

CATCTATGAGACATACGGAGACACATGCT 

CCTCCCCTCC(^TCanX5AAAAAGCTCACCGAA 

GGATTGTCXSATATCATTGCAACCGACATTC^ 

TGTGTTTATCCATAACTTTAAGAGGAAGGGAGGC^ 

GCCAGCQATATCGTTCCCXmK5GCGATATCTATAAGAGAT^ 

ATTCACCCXntZAGCATTCTGQATATCAGAGTGA 

TCX:aW5AGGCCCTGACAGACTCX3AACGC^ 

(XTCTGTCTCAGACAAGGGGAGACAATCCCACAGACCCT 

TGAATAAGGAACTGAAAAAGATTATCGGACAGCTCAGGGAC^^ 

TGCCATGCAGATCXTTCAAGGATACCATTAACG 

CCX^TTGCCCCTCTCACCXSAGATTTGT^^ 

CCTATAACACACCCGTCTTTGCC^TTCAAGTGA^ 

CTTCATTCACAATTTCAAAAGGAAAGGCGGAA 

GACTTTAGGGAGCTCAACAAACCTACACA^ 

ACCK^GGAGCCTOTSTCTGTTCAGCT 

TAGCAGAATCGGCATOVCTAGGCAACGTAGAGGTAGGAACGG 

GACCCX!ATTCCCATTCACTATTGCGCTCC(XKrrGGC^ 

AAAAGGATTGGCATCrcGGACAGGGAGTGTCCATCG 

CCTCGCCXJATCAGCCTAGCCrCTATCCrrcCC^ 

GCaXHTVGAAGGGCTATCCTCGGCCAAAT^ 

CCCTGCAATACCTTGCACTCAGCO^^ 

TCAGGTCTGCTTCCIX^GAAGGGACTGGG^ 

AGCAGGCAAGACXSAAGACGCAGCCAAGTACCATAGC^ 

TCGTCXXrTAAGGAAATCGTCGCAAGTTG^ 

TGAAGCCGTGAGACACITTCCCAGACCXrrGGCTGCATGGC 
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TCCCTGAAACCCTGTGTGA^ 

ACTCCACCCAAGTGGAC CCCGATCTGGCTX^CCATCTGATTCACCTCCACT 

MTCCATCXCATGGGCCTACACGGAATGGAGGATGAGGAAAGGQAAGTGC^ 

GCTCTCAGGCATATCGCTTCTAGTCCTATCGATACCGTCCCOT 

TGAAACAGTGGCCCCTCACCGAAGAGAAAATCAAAGCCATTTG 

GCAGTCCAGGCCTGAGCCTACCGCACCCCC^ 

TTTTACAGACACCATTACGAAAGCCAACACCCTAAGOT^ 

CTTGCCAAGGCGTCGGCGGACCCAGTCACAAAGCCAGGGT^ 

CATTCCTCCCATTGTGCCCAAAGAGATTGTGGC^ 

CAGGTGGACTOTAGCCCITCCGAGGGATaUVGA^ 

GGCAAATCCXSCGCCATCTCCGAGTGGAT^ 

AAGCACACTGCAAGAGCAAATCGCATGGATGACAAACAA 

AACCCTCAGTCCCAGGGCGTCGTGGAAAGCATGAACAAAGAGCT^ 

ATCTCTGGGTCTACAATACCCAAGGCTTTTTCCCTGAC^ 

TAGCAGAGCAAGACAGAGACAGATTCATCCr^^ 

CCTGTGCCTCTGCAACTOTATAAGACACT 

CACTGCTCGTGCAAAACGCAAACCCTGACTGTG^ 

CX5GAAACGAACAGGTGGACAAACTGGTCAGCGCTGGCACT 

GAAAACGTCACCX^GAACTTTAACATGTGGAAAAACAAT^ 

TGAAATGCAATAACAAAAAGTTCAACXX3AACTGGA 

AGAGCTCAAGAATAGCGCTGrCTrcCTGCTCAACGCTACC^^ 

GAAGTGGTTCAGTCCCAGCATCCCAAAGTGT^^ 

AGACATACTGGGGCCTCCACACAGGCGCTGCTATGGGCGGTA^ 

AGTGAGAGAGAGAATCAGACAGACACCCCCIXKrCGCTGAGGGAG^ 

AGTGCCCATACCAATGACGTCAAGCAACTGACAGAGGTT^ 

AGCCATTGTGCIGCCTGAGAAAGAAGGCTGGACCGTC 

TCCCAGATTTACGCCXXaAAGAGCCATTGAGGCTX^ 

TGCAAGC(^AGTGCTCGCCATTGAGAGATACCTCGCCCTC 

TAGCCAATACGCTCTAGGCATCATTCAGGCTCAGCCT^ 

ATTTACAAGATCCTCACCXaAATCTCAAAATCAACAGGAT^ 

AGAGAAGGGTCGTGKMAGGGAAAAGCGltXX 

ACCCAAAATGATCGGAGGCATTGGAGGCTTTA^ 

CAGAAGGCTATCTCCTACCATAGGCTCAGGGATTTC^ 

GCTCCCTtSAGAGGCCTCOMAGAGGCA 

AGTGATTCCCATGTTTACCXXrrCTGTCCGAGGGA aagat c tgaatt c 
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Cl fragment 

gg a t c c a c cATGCTCGAGAGCAAGACACCQX^ 

AAGTGGGATTTCCTGTGAGACCCCAAGTGCCTAGAGCTTGGAGGG 

AGGCTTTGAGAGAG<XCTCCTAG(XGCCGAATGG^ 

CAAATGAGAGAGCCCAGAGGAAGCGATATCGCTC^^ 

TCAGCTTGrTTTCTGAAAGAGAAAG^ 

CGAAGACCAAAGCCCTCAGAGAGAGCCTTACAATGAGTGGA^^ 

CAAGGCCAATGGACCTACCAAATCTTTCAGGAACCCT^^ 

GCXXTTCACACAAACTGGATGACAGAAACCXrrcCTGGTC 

TCTGGGAACCGGAGCCACACTXK5AAGAGCCTGAGGTCATCC 

CAAGACCTGAATACGATGCTCAACATCGTCAGCGGACA 

ATAACCCTCCCATCXXrTGTCGGAGAGATTTACAAAAGGTGGATTATC 

CGGCCTCAAGAAAAAGAAAAGCGTCACCGTCCTGGATGTGGGA 

CAAAAGGAAACCTGGGAGGCTTGGTGGACGGAATACTG 

CCCCTCCCCKX5TGTTTCCCGATTGGCATAACT 

GTGCTTTAAGCTCGTGCCTGTGGACCCCTVAACTGTGCT 

TTTTACGTGGACXXSAGCCGCCAAaVGAGAGA^ 

TTAGCCCCAGGACCCTCAACGCTTGGGTCAAGGTCXn^ 

CTGGGCTACCCATGKTCTGTGTGCGTACCGATCCCAATCC^ 

GATCAGAAACTCCItXXK^TTTGGGGAT 

GGTCCAACCAAGCTCXSCCATAACAAAG^ 

7UVTCAAACCCCCTCTGCCTAGCGTTAAGACAATCA 

CCTAACAATAACACAAGGAAAGCCGCXX5CTAGTGAAGT CAGCAAGCTGCCG 

CXJGATACAGGCGACKXIAGCCAGCT 

CX3CTTGTTGGTGGGCCAATATCAAACAGGA6CT 

GGCGCTGCCAATAGGGAAACCCAACTGGGAAAGGCGGGCTATG 

GAATCTGGCAGCTCGACTGTACCCATCTGGAAGGCAAAff 

TGAGGCTGAGGTCGGCAATGAGCAAGTGGATAAGCT^ 

ATCAATAAGGCTCAGGAAGAGCACGAAGTCAGGGAAAGGACT 

CTGTCTCCXIAGGATCTGGATAAGTACGGAGCCCTC 

TGGCGTCGGCAACCCTCAGATTTTGGGAGAGTO^ 

ACCCCTAAGTTTAAGTTCCCCATTCAGAAAGAGAC^ 

ACAGACTGATCAGCTGTAACAOUVGCGTTATCAAA 

TTACTGTGCCCX!TCCTAGCTGGATGGGCT 

GAAAAGGACTCCTGGACAGTGAATGACATTOVGAAAT^ 

AAATGATGACAGCATGTCAGGGAGTGGGAGGCCCT 

CATTTGGAAAGGCCCTGCCAAACTGCrCTGGAAAC^ 

CAACTGATAGAAGCCCTCCTGGATACAGGAGCCX^TGACAC 

GAATCAAACAGCTCXIAGGCTAGGGTCCTGGCT 

CTCTAGCGGAAAGGCTGCTATGGAAAACAGAT^ 
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ACATCXxAATAGCCTCGTGAAACACCAT^ 

ATAAGTCCTTCXSAAGAGATTTGGAATAACATGACCTGGAT^ 

TGTGTATATCGAATACAAGAAACTGCTCAGGCAAAGGAGAA 

CTGGAAACXXXrTGAQGGATOTAAACAGATCCTGGAACAGCT^ 

CTAOTAGAAAGCIXXTTOAAACAGAGAAAGATTGA 

CAATGAGTCCGAGGGAGACACACCCGGAATC^^ 

CCCATTTTCCAAAGCTCCATGACCCAAATCCrcATGA 

AGTGCITCAACKnXXSAAAGGAAGGCX^ 

CTCCX^GGATAGCX^CACCTCCGGCACACAGCAAAGCCA^ 

GTGGCCAGCGGATATATOSAAGCCX^^ 

TTAAGCCICTGGTCAGCACACAGCTCCT^ 

CTTTACXXxATAACAAACTGGTCG^^ 

TGTAAGCTCCTGAGAGGCACCAAAGCCCTCACTCCrC^^ 

ATGTGAATGCrocrcAAACCAGAGGCGATAACCCTACro 

agagacagacccttgtgacgccxk:ccctagctccaa^^ 
ccccctctggaaaggctccacctcgactgtagcgaaga 

GGTTCAATATCACCAACIXXXnTGTGGTACATTAAGATTTTCA 

GTACTCACCrGTCTCCATCCIX^ 

AAGCTCCTOTGGAAGGGAGAG^ 

CTAAGATTATCGAACTGAATAAGAGAAC 

GAAGAAGAAAAAGTCAGTGACyVGTGGCCGCTAT^^ 

TCXXXX^CAATGATTCTGGGACTGGTCATCA 

GGGGTACAAAGGCTCTGACAGAGATTGTGACACrc 

CTCCCGCCTXXTCCCTGAGACATAT^ 

CTGGGACXKrrCC^CCTCAAGGGACTGCAAA 

GGGGCTCTAGCCTGGGGCAACrGCAACCTGCTCr^^ 

CGCTACCCTCTGGTGTGTGCATCAGGAGCTCTACAAATACAAA^ 

ACCAGAGCCAAAAGOAGAGTGGTCGAGAGAGAGAAAAGGC^ 

TGGAGCTGGAGGAAAACAGAGAGATTCTCAGGGAACCCOT 

CCAAGTCAACAATGCCAAC^TCATGATCX^ 

GAGGAGGTCGGCITCCCCGTCAGGCCCC^^ 

TCTTCAGACAGGGACCCAAAGAGCCTTTCAGAGACT 

CTCACACX5AACTGAAAAACTXX»^^ 

GrrGTGGGCCTCCAGGGAACTGGAAAGGTTT^ 

CCGAGTCCXSAGCTOn^AATCAGATTATCGAAGAG 

CATTGAGGTCGACCAAAGGGCTTGGAGAGCCATTCTGAATAT^^ 

AGGTGGCCCGTCAGGACAATCTATACCGATAACGGAAGCAATT^ 

GGGCTGATGTCAAACAGCTCACCGCAGTCGTCCAGAA 

OVAGTTCAGACKXrCTATCGCT^ 
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C2 fragment 

gga t CcaccATGCTCGAGAGCAA(^CAGCCGCTAACAATACCGATTC 

AAGTGGGATTTCCTGTGAGACCCCAAGTGCCTAGAGCCGGGAGGGCTAT 

AGGCCrrTGAGAGAGCCCTCCTAGCCXKXGAATCX^ 

CAAATGAGAGAGCCCAGGGGAAGCGA^ 

TCAGCTTGTTTCTGAAAQAGAAAG 

CXSAAGACCAAAGCTCTCAGAGAGAGCCTTAC^ 

CAAGGCCAATGGACCTTCCAAATCTTTCAGGAACCCTO 

GCXXHXIACACAAACnXX^TGACAGATACCCTCCrrc 

TCTGGGACCCGGAGCCTCACTGGAAGAGCCT^ 

CAAGACXJTGAATATGATGCTCAACACCGTCGG 

ATAArcCTCCXIATCCCTGTCGGAGAGATTTACAAAA 

CXXKXTTCAAGAAAAAGAAAAGCGTCACCGTCCTGGATC 

CAAAGGGAAACCTGGGAGGCTTGGTGGATGGAATACTCG CAG<KrrACCTGGATTCCTGAGGGGGAGTTTGTGAATA 

CCCCTCCXCTCGTGTTTCCCGATTGGCAAAACTATACCC 

GTGCTTTAAGCTXXnXKrCTGTGGACCCCAAACTGTG 

TTTTACXXXJGACGCytfXXXK2CAACAG 

TTAGCCCCAGGACCCTCAACGCTTGGGTCAAGGTCA 

CTGGGCTArcCATGCCTGTGTGCCT 

GATCAGAAACrCCTCGGCATTTGGGGATG^ 

GGTCCAACCCAGCTOGCCATAACAAAGTGGGAAGCCrc 

AATCAAACCCCCrCTGCCTAGCGTTAAGAC^TC^ 

CCTIAACAATAACACAAGG&CAGCCGCCGCTA^ 

CCGATACAGGCAGCTCCAGCAAGGTCAGCCAAAACT 

CXXTl"ltiTlXjGTGGGCCAATATCAAACAGGAGTT^ 

GGCGCTGCCAATAGGGAAACCAAACTGGGAAAGGCTGG 

GAATCTGGCAGCIXX^CTGTACCCATCTGAA 

TGAGGCTGAGGTCGGCAATGAGCAAGTGGATAAGCTCGTC 

ATCGATAAGGCTOU3GAAGAGCACGAAGTCAGGGAAW 

CTGTCTCCCAGGATCTGGATAAGTACGGAGCCATCA 

TGGCXntX^AACCCTCAGATTTTGGG 

ACCCXTrAAGTTTAAGCTCCCCATTCAGAAAGA^ 

ACAGACTGATCAGCTGTAACACAAGCGTTATCAC^ 

TTACTGTCCCCCTCCTAGCnMATGGGCT 

GAAAAGGAGTCCTGGACAGTGAATGACATTCAGAAAACAAT^ 

AAAATATGACAGCATGTCAGGGAGTGGGAGGCCCTGGCCATAAGGCT 

CATTTCGAAAGGCCCTGCCAAACTGCTCTGGAAA 

CAACTGAAAGAAGCCCTCCTGGATACAGGAGCCGATGAC^ 

GAATCAAACAGCTCCAGGCTAGGCTCCTGGCTATTO 

CTGTAGCGGAAAGGCTGCTATGGAAAACAGATGGCAAGTGATGA 
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ACATGGAATAGCCTCGTGAAAC^ 

ATAACmxrTTCGAAGAGATTTGGAATAACATGACCTGG^ 

TGTGTTTATCGAATACAAGAAACnXSCTCAGGCAAW 

CTGGAAACCGCTGAGGGATGTAAACAGATCCTGGAACA^ 

CTAGTAGAAAGCTCCTGAGACAGAGAAAGATTQACAGACTC 

CAATGAGTCCXSAGGGAGACACACCCGGAATCAGAT^ 

GCCATTTTCCAAAGCTCCATGACC 

AGTGCTTCAACTGTGGAAAGGAAGGCCATCT^ 

CTCCGAGOATAGCGACACCTCCGGCACACAGCAAAGCCA^ 

GTGGCCAGCGGATATATCGAAGCCGAAGTGATCCCTGCCGAAACTO 

TTAAGCCTGTGGTCAGCACACAGCTCCTGCTCA^ 

CTTTACCAATAACAAACTGGTCX3GCAAACnX3AATTGTC 

TGTAAGCTCCTGAGAGGCACCAAAGCCCTCACCCCTCTGTGTGTG^ 

ATGTQAATGCTGCTCAACCCAGAGGCGATAACCCTACCX^TC 

AGAGACAGACCCTTTTGACGCCGC^ 

CCCCCTCTGGAAAGGCTCCACCTCGACTGTAGCGiAAGAC^ 

GGTTCAATATCACCAACTGGCTGTGGTACATTAAGATTT^ 

GTACC^UVCCTGTCTCCATCCTCGACATTAAGCA 

AAGCrCCTGTGGAAGGGAGAGGOA^CGTCGTC 

CTAAGATTATCGAACTGAATAAGAGAACCCAAGACTTTTGG 

GAAAAAGAAAAAGTCCGTGACAGTGGCCGCTATGAGAGTGAAAGAG^ 

TGGGGCACAATGATTCTGGGACTGOT 

GGGGTGCAAAGGCTCTGATAGACATTGTC 

CTCCCACCTCGCCCTGAGACATATCGCXIAGGGAACTGCATCC^ 

CntSGGACGCTCCAGCXTTCAAGGAACrGCGAA 

GGGGCTCTAGCCTGGAGCAACTGCAATCTGCTCT 

CGCTACCCTCTGGTGTtnXK^TCAGGAGCTCT^ 

ACCAAAGCCAAAAGGAGAGTGGTCCAGAGAGAGAAAAGGCTCACCGATA 

TGGAGCTGGAGGAAAACAGAGAGATTC^ 

CCAAGCC^VACAATtXTCAACATCATG^^ 

GAGGGGGTCGGCTTCCCCGTCAGGCCTCAGGTCCCAC^ 

TCTTCAAACAGGGLACCCAAAGAGCCTTTCAGAC^ 

CTCACAGGAAGTGAAAAACTGGGAGAAAATCAGACTGAC^ 

GTGTGGGCCrCCAGGGAACTGGAAAGGTTTGCCT^ 

CCGAGTCCGAGCTCGTGAGTCAGATT^ 

CATTGAGGTCGTCOUUU3GGCTTGGAGAGC 

AGGTGGCCCGTCAAGATAATCCATACCGATAACGQAAGCAATTTC^ 
GGGCTGATGTGAAACAGCTCACCGA7VGTCGTTCAGA 

CAAG TTCAG ACAGCCT ATCGCT6 CXX3CCAGCAACG AGAACATGGACGC CATGGCTGCTt gaagatctgaattc 
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